1 / 52

Windows Azure Tables and Queues Deep Dive

SVC09. Windows Azure Tables and Queues Deep Dive. Jai Haridas Software Design Engineer Microsoft Corporation. Agenda. Overview of Windows Azure Tables Patterns and Practices for Windows Azure Tables Overview of Windows Azure Queues Patterns and Practices for Windows Azure Queues

iliana
Download Presentation

Windows Azure Tables and Queues Deep Dive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SVC09 Windows Azure Tables and Queues Deep Dive Jai Haridas Software Design Engineer Microsoft Corporation

  2. Agenda • Overview of Windows Azure Tables • Patterns and Practices for Windows Azure Tables • Overview of Windows Azure Queues • Patterns and Practices for Windows Azure Queues • Q&A 2

  3. Fundamental Storage Abstractions • Tables– Provide structured storage. A Table is a set of entities, which contain a set of properties • Queues– Provide reliable storage and delivery of messages for an application • Blobs – Provide a simple interface for storing named files along with metadata for the file • Drives – Provides durable NTFS volumes for Windows Azure applications to use (new) 3

  4. Windows Azure Tables • Provides Structured Storage • Massively Scalable Tables • Billions of entities (rows) and TBs of data • Can use thousands of servers as traffic grows • Highly Available & Durable • Data is replicated several times • Familiar and Easy to use API • ADO.NET Data Services – .NET 3.5 SP1 • .NET classes and LINQ • REST – with any platform or language 4

  5. Table Storage Concepts Accounts Tables Entities Email =… Name = … Users Email =… Name = … moviesonline Genre =… Title = … Movies Genre =… Title = … 5

  6. Table Data Model • Table • A storage account can create many tables • Table name is scoped by account • Set of entities (i.e. rows) • Entity • Set of properties (columns) • Required properties • PartitionKey, RowKey and Timestamp 6

  7. Required Entity Properties • PartitionKey & RowKey • Uniquely identifies an entity • Defines the sort order • Use them to scale your application • Timestamp • Read only • Optimistic Concurrency 7

  8. PartitionKey And Partitions • PartitionKey • Used to group entities in the table into partitions • A table partition • All entities with same partition key value • Unit of scale • Control entity locality • Row key provides uniqueness within a partition 8

  9. Partitions and Partition Ranges Server A Table = Movies [Action - Comedy) Server A Table = Movies Server B Table = Movies [Comedy- Western) 9

  10. Table Operations • Table • Create • Query • Delete • Entities • Insert • Update • Merge – Partial Update • Replace – Update entire entity • Delete • Query • Entity Group Transaction (new)

  11. Table Schema Define the schema as a .NET class [DataServiceKey("PartitionKey", "RowKey")] publicclassMovie { ///<summary> /// Category is the partition key ///</summary> publicstringPartitionKey { get; set; } ///<summary> /// Title is the row key ///</summary> publicstringRowKey { get; set; } publicDateTime Timestamp { get; set; } publicintReleaseYear { get; set; } publicstring Language { get; set; } publicstring Cast { get; set; } } 11

  12. Table SDK Sample Code StorageCredentialsAccountAndKeycredentials = newStorageCredentialsAccountAndKey( “myaccount", “myKey"); stringbaseUri= "http://myaccount.table.core.windows.net"; CloudTableClienttableClient = newCloudTableClient(baseUri, credentials); tableClient.CreateTable(“Movies"); TableServiceContextcontext = tableClient.GetDataServiceContext(); CloudTableQuery<Movie> q = (from movie incontext.CreateQuery<Movie>(“Movies") wheremovie.PartitionKey == “Action" && movie.RowKey== "The Bourne Ultimatum" selectmovie).AsTableServiceQuery<Movie>(); MoviemovieToUpdate = q.FirstOrDefault(); // Update movie context.UpdateObject(movieToUpdate); context.SaveChangesWithRetries(); //Add movie context.AddObject(new Movie(“Action" , movieToAdd)); context.SaveChangesWithRetries(); 12

  13. Agenda • Overview of Windows Azure Tables • Patterns and Practices for Windows Azure Tables • Overview of Windows Azure Queues • Patterns and Practices for Windows Azure Queues • Q & A 13

  14. Key Selection: Things to Consider • Scalability • Distribute load as much as possible • Hot partitions can be load balanced • PartitionKeyis critical for scalability • Query Efficiency & Speed • Avoid frequent large scans • Parallelize queries • Entity group transactions (new) • Transactions across a single partition • Transaction semantics & Reduce round trips 14

  15. Key Selection: Case Study 1 • Table for listing all movies • Home page lists movies based on chosen category 15

  16. Movie Listing – Solution 1 • Why do I need multiple PartitionKeys? • Account name as Partition Key • Movie title as RowKey since movie names need to be sorted • Category as a separate property • Does this scale? 16

  17. Movie Listing – Solution 1 • Single partition - Entire table served by one server • All requests served by that single server • Does not scale Client Client Request Request Request Request Server A 17

  18. Movie Listing – Solution 2 • All movies partitioned by category • Allows system to load balance hot partitions • Load distributed • Better than single partition Server A Client Client Request Request Request Request Request Request Request Request Server B 18

  19. Key Selection: Case Study 2 • Log every transaction into a table for diagnostics • Scale Write Intensive Scenario • Logs can be retrieved for a given time range 19

  20. Logging - Solution 1 • Timestamp as Partition Key • Looks like an obvious choice • It is not a single partition as time moves forward • Append only • Requests to single partition range • Load balancingdoesnot help • Server may throttle Server A Applications Client Server B Request Request Request Request 20

  21. Logging Solution 2 - Distribute "Append Only” • Prefix timestamp such that load is distributed • Id of the node logging • Hash into N buckets • Write load is now distributed • Better throughput • To query logs in time range • Parallelize it across prefix values Server A Applications Client Server B Request Request Request Request 21

  22. Key Selection: Query Efficiency & Speed • Select keys that allow fast retrieval • Reduce scan range • Reduce scan frequency 22

  23. Single Entity Query • Where PartitionKey=‘SciFi’ and RowKey = ‘Star Trek’ • Efficient processing • No continuation tokens Server A Client Request Server B Result 23

  24. Table Scan Query • Select * from Movies where Rating > 4 • Returns Continuation token • 1000 movies in result set • Partition range boundary • Serial Processing: Wait for continuation token before proceeding Returns 1000 movies Partition range boundary hit Server A Cont. Cont. Return continuation Client Request Request Cont. Request Cont. Server B Cont. 24

  25. Make Scans Faster • Split “Select * from Movies where Rating > 4” into • Where PartitionKey >= “A” and PartitionKey < “D” and Rating > 4 • Where PartitionKey >= “D” and PartitionKey < “I” and Rating > 4 • Etc. • Execute in parallel • Each query handles continuation Server A Cont. Cont. Request Client Request Request Server B Cont. 25

  26. Query Speed • Fast • Single PartitionKey and RowKey with equality • Medium • Single partition but a small range for RowKey • Entire partition or table that is small • Slow • Large single scan • Large table scan • “OR” predicates on keys => no query optimization => results in scan • Expect continuation token for all except in 1 26

  27. Make Queries Faster • Large Scans • Split the range and parallelize queries • Create and maintain own views that help queries • “Or” Predicates • Execute individual query in parallel instead of using “OR” • User Interactive • Cache the result to reduce scan frequency 27

  28. Expect Continuation Tokens – Seriously! • Maximum of 1000 rows in a response • At the end of partition range boundary • Maximum of 5 seconds to execute the query 28

  29. Entity Group Transactions (EGT) (new) • Atomically perform multiple insert/update/deleteover entities in same partition in a single transaction • Maximum of 100 commands in a single transaction and payload < 4 MB • ADO.Net Data Service • Use SaveChangesOptions.Batch 29

  30. Key Selection: Entity Group Transaction • Case Study • Maintain user account information • Account ID, User Name, Address, Number of rentals • Maintain information of checked out rentals • Account ID, Movie Title, Check out date, Due date • Solution 1 – Maintain two tables – Users & Rentals • Handle Cross table consistency • Insert into Rentals table succeeds • Update to Users table fails • Queue to maintain consistency 30

  31. Solution 2 • Store Account Information and Rental details in same table • Maintain same PartitionKey to enforce transactions • Account ID as PartitionKey • Update total count and Insert new rentals using Entity Group Transaction • Prefix RowKey with “Kind” code: A = Account, R = Rental • Row key for account info: [Kind Code]_[AccountId] • Row Key for rental info: [Kind Code]_[Title] • Rental Properties not set for Account row and vice versa 31

  32. Best Practices & Summary • Select PartitionKey and RowKey that help scale • Efficient for frequently used queries • Supports batch transactions • Distributes load • Distribute “Append only” patterns using prefix to PartitionKey • Always Handle continuation tokens • Client can maintain their own cache/views instead of frequent scans • Future Feature - Secondary Index • Execute parallel queries instead of “OR” predicates • Implement back-off strategy for retries 32

  33. Agenda • Overview of Windows Azure Tables • Patterns and Practices for Windows Azure Tables • Overview of Windows Azure Queues • Patterns and Practices for Windows Azure Queues • Q & A 33

  34. Windows Azure Queues • Queue are performance efficient, highly available and provide reliable message delivery • Simple, asynchronous work dispatch • Programming semantics ensure that a message can be processed at least once • Access is provided via REST 34

  35. Queue Storage Concepts Accounts Queues Messages 128 x 128 http://... thumbnailjobs 256 x 256 http://... sally http://... traverselinks http://... 35

  36. Account, Queues and Messages • An account can create many queues • Queue Name is scoped by the account • A Queue contains messages • No limit on number of messages stored in a queue • Set a limit for message expiration • Messages • Message size <= 8 KB • To store larger data, store data in blob/entity storage, and the blob/entity name in the message • Message now has dequeue count 36

  37. Queue Operations • Queue • Create Queue • Delete Queue • List Queues • Get/Set Queue Metadata • Messages • Add Message (i.e. Enqueue Message) • Get Message(s) (i.e. Dequeue Message) • Peek Message(s) • Delete Message 37

  38. Queue Programming Api CloudQueueClientqueueClient = newCloudQueueClient(baseUri, credentials); CloudQueuequeue = queueClient.GetQueueReference("test1"); queue.CreateIfNotExist(); //MessageCountis populated via FetchAttributes queue.FetchAttributes(); CloudQueueMessagemessage = newCloudQueueMessage("Some content"); queue.AddMessage(message); message = queue.GetMessage(TimeSpan.FromMinutes(10) /*visibility timeout*/); //Process the message here … queue.DeleteMessage(message); 38

  39. Agenda • Overview of Windows Azure Tables • Patterns and Practices for Windows Azure Tables • Overview of Windows Azure Queues • Patterns and Practices for Windows Azure Queues • Q & A 39

  40. Removing Poison Messages Producers Consumers C1 P2 1. GetMessage(Q, 30 s)  msg 1 4 0 3 0 3 2 0 2 1 2 1 2 1 1 1 1 1 1 1 1 0 C2 P1 2. GetMessage(Q, 30 s)  msg 2 40

  41. Removing Poison Messages Producers Consumers 1 1 C1 P2 1. GetMessage(Q, 30 s)  msg 1 5. C1 crashed 4 0 3 0 2 1 1 1 1 2 1 2 1 1 3 6. msg1 visible 30 s after Dequeue 2 1 C2 P1 2. GetMessage(Q, 30 s)  msg 2 3. C2 consumed msg 2 4. DeleteMessage(Q, msg 2) 7. GetMessage(Q, 30 s)  msg 1 41

  42. Removing Poison Messages Producers Consumers 1. Dequeue(Q, 30 sec)  msg 1 5. C1 crashed 10. C1 restarted 11. Dequeue(Q, 30 sec)  msg 1 12. DequeueCount > 2 13. Delete (Q, msg1) C1 P2 4 0 3 0 1 2 1 3 1 2 1 3 3 1 2 C2 P1 6. msg1 visible 30s after Dequeue 9. msg1 visible 30s after Dequeue 2. Dequeue(Q, 30 sec)  msg 2 3. C2 consumed msg 2 4. Delete(Q, msg 2) 7. Dequeue(Q, 30 sec)  msg1 8. C2 crashed 42

  43. Best Practices & Summary • Make message processing idempotent • No need to deal with failures • Do not rely on order • Invisible messages result in out of order • Use Dequeue count to remove poison messages • Enforce threshold on message’s dequeue count • Use message count to dynamically increase/reduce workers • Use blob to store message data with reference in message • Messages > 8KB • Batch messages • Garbage collect orphaned blobs 43

  44. Future Features • Allow workers to extend invisibility time • Time to process message unknown at dequeue time • Worker can extend the time as needed • Allow longer invisibility time • Long running work items may need more than 2 hours • Allow messages to not expire • Large backlogs will not cause messages to expire 44

  45. Takeaways • Table • Scalable & Reliable Structured Storage System • Partitioning is critical to scalability • Entity Group Transactions (new) • Queue • Scalable & Reliable Messaging System • Dequeue count returned with message (new) • Use back-off strategy on retries • Official Storage Client Library (new) 45

  46. Windows Azure Session Alerts!! • Storing and Manipulating Blobs and Files with Windows Azure Storage – 11/18 (4:30 PM) • Patterns for building Reliable & Scalable Applications with Windows Azure – 11/19 (8:30 AM) • Automating the Application Lifecycle with Windows Azure – 11/19 (10:00 AM)

  47. Q&A

  48. Windows Azure PDC Swag

  49. YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation forms online at MicrosoftPDC.com

  50. Learn More On Channel 9 • Expand your PDC experience through Channel 9 • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses channel9.msdn.com/learn Built by Developers for Developers….

More Related