330 likes | 451 Views
Handling Big Data in Windows Azure Storage. Alan Smith Active Solution c loudcasts.net@gmail.com @alansmith www.cloudcasts.net. On-Premise. On-Premise. Replication. MSDN Universal - $150. Implementation Challenges. Text Search Implementation. Windows Azure Websites.
E N D
Handling Big Data in Windows Azure Storage Alan Smith Active Solution cloudcasts.net@gmail.com @alansmith www.cloudcasts.net
On-Premise On-Premise Replication
Text Search Implementation Windows Azure Websites Windows Azure Storage Azure Wiki Website Blob Storage – Pages Table Storage – Text Index
Text Index Table Design • Query on PartitionKey (word) • Ordered by RowKey (word count on page)
Uploading Page Data Windows Azure Storage Upload Page Content to Blob Storage 27 XML Content Files (41.4 GB - 4,356,508 Pages) Blob Storage (4,356,508 Blobs)
Creating Text Index Data Page IDs and Titles (124 MB) Parse Page Text 27 XML Content Files (41.4 GB - 4,356,508 Pages) Index Entries (19,277 Files - 9.83 GB)
Index Data Files • Contains 1,000 lines • Each line contains 100 entries for a word (1 transaction) typical#2356523,1|2356987,1|2357098,1|2357186,1|2357237,1|2357704,1|2357705,1 history#2375229,1|2375230,1|2375232,1|2375279,1|2375293,3|2375300,1|2375314,2 renowned#2338682,1|2338841,2|2339194,1|2339509,1|2339791,1|2340298,1|2340408,1 line#2372733,1|2372749,2|2372774,2|2372784,2|2372790,1|2372796,1|2372813,1 varies#2316134,1|2317202,1|2318782,1|2319263,1|2319437,1|2319766,1|2319969,1 moore#2348931,2|2349076,2|2349268,1|2349746,8|2349903,1|2350368,2|2350437,1 journal#2371460,2|2371490,1|2371518,2|2371524,1|2371565,3|2371591,6|2371609,2 elderly#2300000,2|2300127,1|2301060,1|2301207,1|2301873,1|2302199,1|2302733,1 bearing#2331971,1|2332125,1|2332422,1|2332610,1|2333094,1|2333854,1|2334189,1
Insert Index Entries Windows Azure Storage Windows Azure Services Windows Azure Storage Worker Roles Table Storage Blob Storage Queue
Windows Azure Windows Azure Storage On-Premise http://azurespeedtest.azurewebsites.net/ Blobs Tables Queues
Windows Azure Windows Azure Virtual Machines Windows Azure Storage On-Premise VM VM http://azurespeedtest.azurewebsites.net/ Blobs Tables Queues
ServicePointManager.DefaultConnectionLimit=100; ServicePointManager.UseNagleAlgorithm=false; ServicePointManager.Expect100Continue=false;
Block Blob Operations Single HTTP request for blob Sequential HTTP requests for blocks Parallel HTTP requests for blocks Blob Upload Block Upload Block Commit
Tuning Block Blob Operations SingleBlobUploadThresholdInBytes Single HTTP request for blob StreamWriteSizeInBytes Sequential HTTP requests for blocks Parallel HTTP requests for blocks ParallelOperationThreadCount Blob Upload Block Upload Block Commit
Tuning Blob Operations CloudBlobClient CloudBlockBlob
Parallel and Asynchronous Uploads Parallel Blobs Parallel Blocks Parallel Blobs & Blocks Files Files Files Blob Container Blob Container Blob Container Blob Blob Blob Blob Blob Blob Blob
Storage Monitoring Tables • $MetricsCapacityBlob • $MetricsTransactionsBlob • $MetricsTransactionsTable • $MetricsTransactionsQueue
Handling Outages • 29th February 2012 – Major due to certificate error • MVP Summit 2012 - February 28th– March 2nd • 22nd February 2013 – Storage outage due to certificate error • MVP Summit 2013 – February 18th – 22nd • MVP Summit November 2013 – November 18th – 21st • Correlation does not mean causation!
Consider processing “In the Cloud” • Modify ServicePointManager Settings • Use Parallel and Asynchronous Actions • Tune CloudBlobClient and CloudBlockBlob properties • Fiddler is Your Friend (Especially the Timeline) • Use the Source (Windows Azure SDK on GitHub) • Understand Storage Emulator Limitations • Understand transient faults • Understand Pricing Implications • Leverage Storage Analytics
Thanks! http://wikisearch.azurewebsites.net/