360 likes | 632 Views
MSDN Event. Windows Azure Storage. Windows Azure Storage. Storage in the Cloud Scalable, durable, and available Anywhere at anytime access Only pay for what the service uses Exposed via RESTful Web Services Use from Windows Azure Compute Use from anywhere on the internet
E N D
Windows Azure Storage • Storage in the Cloud • Scalable, durable, and available • Anywhere at anytime access • Only pay for what the service uses • Exposed via RESTful Web Services • Use from Windows Azure Compute • Use from anywhere on the internet • Various storage abstractions • Tables, Blobs, Queues, Drives • What about SQL Azure?
The Storage Client API • The Main API are RESTfuls • Can call these from any HTTP cliente.g. Flash, Silverlight, etc… • For easier .NET access, a Client API from SDK Microsoft.WindowsAzure.StorageClient • Provides a strongly typed wrapper around REST services
Windows Azure Storage Account • User specified globally unique account name • Can choose geo-location to host storage account • US – “North Central” and “South Central” • Europe – “North” and “West” • Asia – “East” and “Southeast” • Can CDN Enable Account • Blobs delivered via 18 global CDN nodes • Can co-locate storage account with compute account • Explicitly or using affinity groups • Accounts have two independent 512 bit shared secret keys • 100TB per account
Storage in the Development Fabric • Provides a local “Mock” storage • Emulates storage in cloud • Allows offline development • Requires SQL Express 2005/2008 or above There are some differences between Cloud and Dev Storage.http://msdn.microsoft.com/dd320275 A good approach for developers:To test pre-deployment, push storage to the cloud first Use Dev Fabric for compute connect to cloud hosted storage.Finally, move compute to the cloud.
Windows Azure Storage Abstractions • Blobs • Provide a simple interface for storing named files along with metadata for the file • Drives • Provides durable NTFS volumes for Windows Azure applications to use • Tables • Provide structured storage. A Table is a set of entities, which contain a set of properties • Queues • Provide reliable storage and delivery of messages for an application
Windows Azure Storage Account • User creates a globally unique storage account name • Choose geo-location to host storage account • Recommended:Co-locate storage account with compute account • Affinity Group
Account Container Blob Blob Storage Concepts http://<account>.blob.core.windows.net/<container>/<blobname> PIC01.JPG images VID1.AVI PIC02.JPG cohowinery videos
Blob Features and Functions • Programming Interfaces • PutBlob • Insert blob and Update blob • GetBlob • Get whole blob or a specific range • DeleteBlob • CopyBlob • SnapshotBlob • LeaseBlob • Associate Metadata with Blob • Standard HTTP metadata (Metadata is <name, value> pairs, up to 8KB per blob
Blob Client Library CloudStorageAccount account = CloudStorageAccount.FromConfigurationSetting("CloudStorageAccount"); CloudBlobClient blobClient = newCloudBlobClient( account.BlobEndpoint, account.Credentials); // Create Container CloudBlobContainer cloudContainer = blobClient.GetContainerReference(containerName); bool hasCreated = cloudContainer.CreateIfNotExist(); // Access Blob in the Container CloudBlob cloudBlob = cloudContainer.GetBlobReference(blobName); //BlobRequestOptions has retry policy, timeout etc. BlobRequestOptions options = newBlobRequestOptions(); //Upload the local file to Blob service cloudBlob.UploadFile(uploadFileName, options); //Download to local file name cloudBlob.DownloadToFile(downloadFileName, options);
Two types of Blobs under the hood • Block Blob • Targeted at streaming workloads • Each blob consists of a sequence of blocks • 2 Phase commit: Blocks are uploaded and then separately committed • Size limit 200GB per blob • Page Blob • Targeted at random read/write workloads • Each blob consists of an array of pages • Each page range write is committed on PUT • Size limit 1TB per blob
Block Blob Streaming Workload w/ Random Reads + Committed Writes • Uploading a large blob Blob blobName = “blob.wmv”; PutBlock(blobName, blockId1, block1Bits); PutBlock(blobName, blockId2, block2Bits); ………… PutBlock(blobName, blockIdN, blockNBits); PutBlockList(blobName, blockId1, blockId2…,blockIdN); 10 GB Movie Block Id 2 Block Id 1 Block Id 3 Block Id N Benefit Update the blob via blocks any way you want Efficient continuation and retry Parallel and out of order upload of blocks Windows Azure Storage blob.wmv blob.wmv
Page Blob – Random Read/Write • Create MyBlob • Specify Blob Size = 10 GB • Fixed Page Size = 512 bytes • Random Access Operations • PutPage[512, 2048) • PutPage[0, 1024) • ClearPage[512, 1536) • PutPage[2048,2560) • GetPageRange[0, 4096) returns valid data ranges: • [0,512) , [1536,2560) • GetBlob[1000, 2048) returns • All 0 for first 536 bytes • Next 512 bytes are data stored in [1536,2048) 0 512 1024 1536 2048 10 GB Address Space 2560 10 GB
Windows Azure Content Delivery Network • Scenario • Frequently accessed blobs • Blobs accessed from many geographic locations • Windows Azure Content Delivery Network (CDN) • Cache and serve your Windows Azure Blobs from the network • 18 locations globally (US, Europe, Asia, Australia and South America), and growing • Benefit • Better experience for users far from blob service • Provide high-bandwidth content delivery, around the world, for popular events
Blobs Tips • For high throughput clients • Set ServicePointManager.DefaultConnectionLimitto allow parallel connections to cloud service • Default value is 2 • Upload/Download multiple files in parallel • ParallelOperationThreadCountin CloudBlobClientcontrols the parallelism for single blob uploading • Parallel uploads used when size >= 32MB • BlobRequestOptionstimeout should be set as a factor of the blob size and your connection BW • Timeout= Size in KB/ExpectedThroughputInKBps • Use retry and exponential backofffor timeouts/server busy
Windows Azure Drive • Provides a durable NTFS volume for Windows Azure applications • Use existing NTFS APIs to access a network attached durable drive • Benefits • Enables existing applications using NTFS to more easily migrate to the cloud • Durability and survival of data on application failover or hardware failure • A Windows Azure Drive is a Page Blob • Mounts Page Blob over the network as an NTFS drive • All flushed and unbuffered writes to drive are made durable to the Page Blob
Windows Azure Drives Capabilities • A Windows Azure Drive is a Page Blob formatted as a NTFS single volume Virtual Hard Drive (VHD) • Drives can be up to 1TB • A Page Blob can only be mounted by one VM at a time for read/write • A VM can dynamically mount up to 16 drives • Remote Access via Page Blob • Can upload the VHD to a Page Blob using the blob interface, and then mount it as a Drive • Can download the Drive through the Page Blob API
Cloud Library Sample //Create Local Storage resource and initialize the local cache for drives CloudDrive.InitializeCache(localCacheDir, cacheSizeInMB); CloudStorageAccount account = CloudStorageAccount.FromConfigurationSetting("CloudStorageAccount"); //Create a cloud drive (PageBlob) CloudDrive drive = account.CreateCloudDrive(pageBlobUri); drive.Create(1000/* sizeInMB */); //Mount the network attached drive on the local file system string pathOnLocalFS = drive.Mount(cacheSizeInMB, DriveMountOptions.None); //Use NTFS APIs to Read/Write files to drive … //Snapshot drive while mounted to create backups Uri snapshotUri = drive.Snapshot(); //Unmount the drive drive.Unmount();
Windows Azure Tables • Provides Massively Scalable Structured Storage • Table can have billions of entities (rows) and TBs of data • Familiar and Easy to use API • WCF (ADO.NET) Data Services (“Astoria”) • .NET classes and LINQ • REST – with any platform or language
Table Storage Concepts Account Table Entity Name =… Email = … customers Name =… Email = … cohowinery Photo ID =… Date =… winephotos Photo ID =… Date =…
Entity Properties • Each entity can have up to 255 properties • Mandatory Properties for every entity • PartitionKey & RowKey • Uniquely identifies an entity • Defines the sort order • Timestamp • Optimistic Concurrency • No fixed schema for rest of properties • Each property is stored as a <name, typed value> pair • No schema stored for a table • Properties can be the standard .NET types • String, binary, bool, DateTime, GUID, int, int64, and double
PartitionKey and Partitions • Every Table has a PartitionKey • It is the first property (column) of your Table • Used to group entities in the Table into partitions • A Table Partition • All entities in a Table with the same partition key value • RowKey provides uniqueness within a partition • PartitionKey is exposed in the programming model • Allows application to control the granularity of the partitions and scalability of its tables
Purpose of the Partition Key • Entity Locality • Entities in the same partition will be stored together • Efficient querying and cache locality • Entity Group Transactions • Atomically perform multiple Insert/Update/Delete over entities in same partition in a single transaction • Table Scalability • System monitors the usage patterns of partitions • Automatically load balance partitions • Each partition can be served by a different storage node • Scale to meet the traffic needs of your table
Partitions and Partition Ranges Server A Table = Movies [MinKey - Comedy) Server A Table = Movies Server B Table = Movies [Comedy - MaxKey)
Table Operations • Table • Create, Query, Delete • Entities • Insert • Update • Merge – Partial update • Replace – Update entire entity • Delete • Query • Entity Group Transactions
Demo @AzureGreeter Twitter Table
Table Tips • Partition your data appropriately • Scale • Queries • Entity Group Transactions • Avoid “Append only” write patterns based on PartitionKey values • Avoid using monotonically increasing suffix with a constant prefix • Example: using only the current timestamp as PartitionKey • If needed, add varying prefix to PartitionKey
Table Tips – cont. • Avoid large table scans when performance is critical • Restructure your schema if required • Concatenate different keys to form appropriate index • Most Optimal: • PartitionKey == “SciFi” and RowKey == “Star Wars” • Scans: Expect continuation tokens (REST) • PartitionKey == “SciFi” and “Sphere” ≤ RowKey ≤ “Star Wars” • “Action” ≤ PartitionKey ≤ “Thriller” • PartitionKey == “Action” || PartitionKey == “Thriller” - currently scans entire table • “Cars” ≤ RowKey ≤ “Star Wars” - scans entire table
Queue Storage Concepts Account Queue Message customer ID order ID http://… order processing cohowinery customer ID order ID http://…
Loosely Coupled interaction with Queues • Enables workflow between roles • Load work in a queue • Producer can forget about message once it is in queue • Many workers consume the queue Input Queue (Work Items) Worker Role Azure Queue Web Role Worker Role Web Role Worker Role Web Role Worker Role
Queue’s Reliable Delivery • Guarantee delivery/processing of messages (two-step consumption) • Worker Dequeuesmessage and it is marked as Invisible for a specified “Invisibility Time” • Worker Deletesmessage when finished processing it If Worker role crashes, message becomes visible after the invisibility time for another Worker to process Input Queue (Work Items) Worker Role Azure Queue Web Role Worker Role Web Role Worker Role Web Role Worker Role
Demo @AzureGreeter Twitter Queue
Queue Tips • Messages can be up to 8KB • Use blob to store large messages, and store blob ref in message • A message may be processed more than once • Make message processing idempotent • Work should be repeatable and can be done multiple times • Assume messages put into queue can be processed in any order • For higher throughput • Batch multiple work items into a single message or into a blob • Use multiple queues • Use DequeueCountto remove poison messages • Enforce threshold on message’s dequeue count • Monitor message count to dynamically increase/reduce workers • Remember: Queue names are lowercase only…