610 likes | 959 Views
Windows Azure Storage. Brad Calder Director/Architect Microsoft Corporation. Windows Azure. Windows Azure is the foundation of Microsoft’s Cloud Platform It is an “Operating System for the Cloud” and provides Essential Services for the Cloud Virtualized Computation Scalable Storage
E N D
Windows Azure Storage Brad Calder Director/Architect Microsoft Corporation
Windows Azure • Windows Azure is the foundation of Microsoft’s Cloud Platform • It is an “Operating System for the Cloud” and provides Essential Services for the Cloud • Virtualized Computation • Scalable Storage • Automatic Management • Developer SDK
Azure™ Services Platform • Windows Azure Storageand SQL Data Services are different storage offerings for the Azure Services Platform
Windows Azure Storage • The goal is to allow users and services • Anywhere at anytime access • Store data for any length of time • Scale to store any amount of data • Be confident that the data will not be lost • Pay for only what they use/store
Windows Azure Storage • Storage • Durable • Scalable (capacity and throughput) • Highly available • Rich storage concepts • Large user data items: blobs • Service state: tables • Service communication: queues • Simple and familiar programming interfaces • REST (HTTP and HTTPS) • .NET accessible
Windows Azure Storage Account • User creates a globally unique storage account name • Receive a 256 bit secret key when creating account • Geo-location and co-location coming soon • Can choose geo-location to host storage account • Example: “US Northwest”, “US Southwest” • Can co-locate storage account with compute account • Provides security for accessing the store • Use secret key to create a HMAC SHA256 signature for each request • Use signature to authenticate request at server
Fundamental Data Abstractions • Blobs – Provide a simple interface for storing named files along with metadata for the file • Tables – Provide structured storage. A Table is a set of entities, which contain a set of properties • Queues – Provide reliable storage and delivery of messages for an application
Account Container • Blob Blob Storage Concepts • IMG001.JPG • pictures • MOV1.AVI • IMG002.JPG • sally • movies
Storage Account and Blob Containers • Storage account • An account can have many blob containers • Container • A container is a set of blobs • Sharing policies are set at the container level • Public READ or Private • Associate metadata with container • Metadata is <name, value> pairs • Up to 8KB per container • List the blobs in a container
Blob Namespace • Blob URL http(s)://<Account>.blob.core.windows.net/<Container>/<BlobName> Example: • Account – sally • Container – music • BlobName – rock/rush/xanadu.mp3 • URL: http://sally.blob.core.windows.net/music/rock/rush/xanadu.mp3
Blob Features and Functions • Store large objects (up to 50GB for CTP) • Associate metadata with blob • Metadata is <name, value> pairs, up to 8KB per blob • Set/Get with or separate from blob data bits • Standard REST Interface • PutBlob • Inserts a new blob or overwrites the existing blob • GetBlob • Get whole blob or a specific range • DeleteBlob
REST PutBlob Account Container Blob Name • PUT • http://dvd.blob.core.windows.net/movies/TheBlob.wmv • HTTP/1.1 Content-Length: 10000000000 • Content-Type: binary/octet-stream • x-ms-meta-year:1958x-ms-meta-tagline:Beware%20of%20the%20Blob • Content-MD5: HUXZLQLMuI/KZ5KDcJPcOA== • x-ms-date: Mon, 27 Oct 2008 17:00:25 GMT • Authorization: SharedKeydvd:F5a+dUDvef+PfMb4T8Rc2jHcwfK58KecSZY+l2naIao= • ……… Blob Data Contents ………
REST GetBlob Get a range of the blob Get whole blob • GET • http://dvd.blob.core.windows.net/movies/TheBlob.wmv • HTTP/1.1 • Range: bytes=1024000-2048000 • GET • http://dvd.blob.core.windows.net/movies/TheBlob.wmv • HTTP/1.1 • Authorization: SharedKeydvd:RGllHMtzKMi4y/nedSk5Vn74IU6/fRMwiPsL+uYSDjY= • x-ms-date: Mon, 27 Oct 2008 17:00:25 GMT
Account • Container • Blob • Block Blob Storage ConceptsAdding Blocks • IMG001.JPG pictures • IMG002.JPG • sally Block 1 movies MOV1.AVI Block 2 Block 3
Uploading a Blob Via Blocks • Uploading a large blob THE BLOB blobName = “TheBlob.wmv”; PutBlock(blobName, blockId1, block1Bits); PutBlock(blobName, blockId2, block2Bits); ………… PutBlock(blobName, blockIdN, blockNBits); PutBlockList(blobName, blockId1,…,blockIdN); 10 GB Movie Block Id 2 Block Id 1 Block Id 3 Block Id N Windows Azure Storage • Benefit • Efficient continuation and retry • Parallel and out of order upload of blocks TheBlob.wmv TheBlob.wmv
PutBlockList Example • Sequence of Operations • PutBlock(BlockId1) • PutBlock(BlockId3) • PutBlock(BlockId4) • PutBlock(BlockId2) • PutBlock(BlockId4) • PutBlockList(BlockId2, BlockId3, BlockId4) BlobName = ExampleBlob.wmv • Example uploading • Blocks out of order • Same block IDs • Unused blocks Block Id 1 Block Id 3 Block Id 2 Block Id 4 Block Id 4 Block Id 2 Block Id 3 Block Id 4 • Committed and readable version of blob
Block Details • Each Blob is a list of blocks • A Block can be up to 4MB each • Each block can be variable size • Each block has a 64 byte ID, scoped by blob name • Block operation • PutBlock • Puts an uncommitted block defined by the block ID for the blob • Block list operations • PutBlockList • Provide the list of blocks to comprise the readable version of the blob • Can use uncommitted blocks only • GetBlockList • Returns the list of committed blocks (not the block data) • Block ID and size of block are returned for each block
Blob Concurrency • Overlapping GET and PUT of same blob • Snapshot Isolation is provided • A GET will see a single version of the blob • Can get back a connection closed error if the blob changes during a long GET • Perform a conditional GET (to see if it changed) if you want to continue • Concurrent blob updates • First PutBlockList wins • Conditional PUT can be used for optimistic concurrency on commit
Choosing a Block ID • Block ID represents 64 bytes of metadata you can track for each block • Concurrent Blob Updates • For a given Block ID, the last one uploaded will be the one used for a PutBlockList • Use a unique hash of the block contents to represent the Block ID • Ranged GET with data integrity checks • Store data integrity hash as part of Block ID
Summary Of Windows Azure Blobs • Easy to use REST Put/Get/Delete interface • Can GET from any range of the blob • Conditional put and get blob • Blocks provide efficient upload and allows an association of an ID with every block • Max blob size for CTP • 50 GB using PutBlock and PutBlockList • 64 MB using PutBlob • Future features • Copy blob, update blob, and append blob
Fundamental Data Abstractions • Blobs – Provide a simple interface for storing named files along with metadata for the file • Tables – Provide structured storage. A table is a set of entities, which contain a set of properties • Queues– Provide reliable storage and delivery of messages for an application
Windows Azure Tables • Provides structured storage • Massively scalable tables • Billions of entities (rows) and TBs of data • Automatically scales across servers as traffic grows • Highly Available • Anywhere at Anytime access to your data • Durable • Data is replicated at least 3 times • Familiar and easy to use programming interfaces • ADO.NET Data Services – .NET 3.5 SP1 • .NET classes and LINQ • REST - with any platform or language
Table Storage Concepts • Account • Table • Entity • Name =… • Email = … • users • Name =… • Email = … • sally • Photo ID =… • Date =… • photo index • Photo ID =… • Date =…
Table Data Model • Table • A storage account can create many tables • Table name is scoped by account • Data is stored in tables • A table is a set of entities (rows) • An entity is a set of properties (columns) • Entity • Two “key” properties that together are the unique ID of the entity in the table • PartitionKey – enables scalability • RowKey – uniquely identifies the entity within the partition
Partition Key And Partitions • Every table has a partition key • It is the first property (column) of your table • Used to group entities in the table into partitions • A table partition • All entities in a table with the same partition key value • Partition key is exposed in the programming model • Allows application to control the granularity of the partitions and enable scalability
Partition Example • Table Partition – all entities in tablewith same partition key value • Application controls granularity of partition Partition 1 Partition 2
Purpose of the Partition Key • Entity locality • Entities in the same partition will be stored together • Efficient querying and cache locality • Entity group transactions (future feature) • Atomically perform multiple insert/update/delete over entities in same partition in a single transaction • Table scalability • We monitor the usage patterns of partitions • Automatically load balance partitions • Each partition can be served by a different storage node • Scale to meet the traffic needs of your table
Choosing a Partition Key • Granularity of entity group transactions • Make the partition key only as big as you need it for entity group transactions • Spread out load across partitions • More partitions – makes it easier to automatically balance load • Currently have one primary index • Important to use a partitionkey that is common in your queries • If partition key is part of query • Fast access to retrieve entities within a single partition • If partition key is not specified in a query • Then every partition has to be scanned
Table Entities and Properties • Each entity can have up to 255 properties • Mandatory properties for every entity in table • Partition key • Row key • All entities have a system maintained version • No fixed schema for rest of properties • Each property is stored as a <name, typed value> pair • No schema stored for a table • 2 entities within the same table can have different properties • Properties can be the standard .NET types • String, binary, bool, DateTime, GUID, int, int64, and double
Table Programming Model • Provide familiar and easy to use interfaces • Leverage your .NET expertise • Table entities are accessed as objects via ADO.NET Data Services – .NET 3.5 SP1 • LINQ – language Integrated query • RESTful access to table and entities • Insert/Update/Delete entities over the table • Query over tables • Get back a list of structured entities
Example Table Definition • Example using ADO.NET Data Services • Table entities are represented as class objects [DataServiceKey("PartitionKey", "RowKey")] public class Customer { // Partition key – Customer Last name public string PartitionKey { get; set;} // Row Key – Customer First name public string RowKey { get; set;} // User defined properties here public DateTimeCustomerSince { get; set; } public double Rating{ get; set; } public string Occupation { get; set; } }
Create and Insert Entity • Create a new customer and insert into table • Customer cust =new Customer( • “Lee”, // Partition Key = Last Name • “Geddy”, // Row Key = First NameDateTime.UtcNow, // Customer Since • 2.0, // Rating • “Engineer”// Occupation); // Service Uri is “http://<Account>.table.core.windows.net/” DataServiceContextcontext = new DataServiceContext(serviceUri); context.AddObject(“Customer”, cust); DataServiceResponseresponse = context.SaveChanges();
Query a Table • LINQ DataServiceContextcontext = new DataServiceContext(“http://myaccount.table.core.windows.net”); varcustomers = from o in context.CreateQuery<Customer>(“Customer”) where o.PartitionKey == “Lee” select o; foreach(Customer customerin customers) { } • REST GET http://myaccount.table.core.windows.net/Customer? $filter= PartitionKeyeq ‘Lee’
.NET and ADO.NET Perf Tips • Default .NET HTTP connections is set to 2 • ServicePointManager.DefaultConnectionLimit = X; • Turn off 100-continue (saves one round trip) • ServicePointManager.Expect100Continue = false; • Turn tracking off for query results that are not going to be modified • MergeOption = MergeOption.NoTracking • To improve performance of ADO.NET de-serialization • Name the entity class the same as the table name, or • Use DataServiceContext.ResolveType to return the type of the entity
Table Tips • Be prepared for partial results from your queries • Check for the continuation token • Storing different types of entities in same table • Have part of the RowKey represent the kind type • In a single query can retrieve all of the related objects of different kinds • When entity group transactions are supported • Can perform transactions across different typed entities in same partition
Summary of Windows Azure Tables • Built to provide massively scalable, highly available and durable structured storage • Automatic load balancing and scaling of tables • Partition key is exposed to the applicatio • Familiar and easy to use LINQ and REST programming interfaces via ADO.Net Data Services • Not a relational database • No joins, no maintenance of foreign keys, etc. • If you need relational DB capabilities, use SQL Data Services • Future features • Entity group transactions • Secondary indexes
Fundamental Data Abstractions • Blobs – Provide a simple interface for storing named files along with metadata for the file • Tables – Provide structured storage. A table is a set of entities, which contain a set of properties • Queues – Provide reliable storage and delivery of messages for an application
Web + Worker Role Pattern • Web role • Web farm that handles request from the internet • Push work items onto storage queue • Worker role • Process work item off storage queue Public internet n m Web role Worker role Q Load balancer Cloud storage (tables, blobs, queues)
Windows Azure Queues • Provide reliable message delivery • Simple, asynchronous work dispatch • Programming semantics ensure that a message can be processed at least once • Queues are highly available, durable and performance efficient • Access is provided via REST
Queue Storage Concepts • Account • Queue • Message • 128x128, http://… • thumbnail jobs • 256x256, http://… • sally • http://… • photo processing jobs • http://…
Account, Queues and Messages • An account can create many queues • Queue Name is scoped by the account • A Queue contains messages • No limit on number of messages stored in a queue • A message is stored for at most a week in a queue http://<Account>.queue.core.windows.net/<QueueName> • Messages • Message size <= 8 KB • To store larger data, store data in blob/entity storage, and the blob/entity name in the message
Queue Programming API • Queues • Create/Delete/Clear queues • Inspect queue length • Messages • Enqueue (QueueName, message) • Dequeue (QueueName, invisibility time T) • Returns the message with a MessageID • Makes the message invisible for time T • Delete(QueueName, MessageID)
Dequeue and Delete Messages Producers Consumers C1 P2 1. Dequeue(Q, 30 sec) msg 1 1 2 4 3 3 2 2 1 1 C2 P1 2. Dequeue(Q, 30 sec) msg 2
Dequeue and Delete Messages Producers Consumers 1 C1 P2 1. Dequeue(Q, 30 sec) msg 1 5. C1 crashed 4 3 2 1 1 3 6. msg1 visible 30 seconds after Dequeue 2 C2 P1 • Benefit • Insures that every message can be processed at least once 2. Dequeue(Q, 30 sec) msg 2 3. C2 consumed msg 2 4. Delete(Q, msg 2) 7. Dequeue(Q, 30 sec) msg 1
Queue Best Practices • Make message processing idempotent • Need to deal with failures • No fixed order for dequeue messages • Invisible messages result in out of order processing • Use the queue length to scale your workers
Windows Azure Data Storage Concepts Container Blobs Account • Table Entities http://<account>.blob.core.windows.net/<container> Queue Messages http://<account>.table.core.windows.net/<table> http://<account>.queue.core.windows.net/<queue>
TakeawaysWindows Azure Storage • Enables developers to access storage • Massively scalable, durable, and available • Anywhere at anytime access • Automatically scale to meet peek traffic demands • Only pay for what the service uses • Easy to use REST and .NET Interfaces • Blobs, tables, and queues
More Information • CTP Access, SDK, forums, white papers, talks • http://www.microsoft.com/azure/windowsazure.mspx • http://msdn.microsoft.com/en-us/azure/cc994380.aspx • Send talk feedback to • Twitter “@tweval” and specify “#wa-storage” and enter anywhere in the message a rating (0-10), 10 being highest • Go to http://tweval.com/wa-storage/ to see all messages and ratings • See recordings of two prior talks at MIX (yesterday) • MIX09-T07F – Overview of Windows Azure • MIX09-T09F – Building Web Applications with Windows Azure • Another session at 9:45am on 3/20/2009 in San Polo 3401 • MIX09-R81M – Using Windows Azure Tools for Microsoft Visual Studio to Build Cloud Services
Please Complete an Evaluation FormYour feedback is important! • Evaluation forms can be found on each chair • Temp Staff at the back of the room have additional evaluation form copies
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.