1 / 33

Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts

Handling Big Data in Windows Azure Storage. Alan Smith Active Solution c loudcasts.net@gmail.com @alansmith www.cloudcasts.net. On-Premise. On-Premise. Replication. MSDN Universal - $150. Implementation Challenges. Text Search Implementation. Windows Azure Websites.

dewitt
Download Presentation

Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling Big Data in Windows Azure Storage Alan Smith Active Solution cloudcasts.net@gmail.com @alansmith www.cloudcasts.net

  2. On-Premise On-Premise Replication

  3. MSDN Universal - $150

  4. Implementation Challenges

  5. Text Search Implementation Windows Azure Websites Windows Azure Storage Azure Wiki Website Blob Storage – Pages Table Storage – Text Index

  6. Text Index Table Design • Query on PartitionKey (word) • Ordered by RowKey (word count on page)

  7. Text Index Table Example

  8. Uploading Page Data Windows Azure Storage Upload Page Content to Blob Storage 27 XML Content Files (41.4 GB - 4,356,508 Pages) Blob Storage (4,356,508 Blobs)

  9. Creating Text Index Data Page IDs and Titles (124 MB) Parse Page Text 27 XML Content Files (41.4 GB - 4,356,508 Pages) Index Entries (19,277 Files - 9.83 GB)

  10. Index Data Files • Contains 1,000 lines • Each line contains 100 entries for a word (1 transaction) typical#2356523,1|2356987,1|2357098,1|2357186,1|2357237,1|2357704,1|2357705,1 history#2375229,1|2375230,1|2375232,1|2375279,1|2375293,3|2375300,1|2375314,2 renowned#2338682,1|2338841,2|2339194,1|2339509,1|2339791,1|2340298,1|2340408,1 line#2372733,1|2372749,2|2372774,2|2372784,2|2372790,1|2372796,1|2372813,1 varies#2316134,1|2317202,1|2318782,1|2319263,1|2319437,1|2319766,1|2319969,1 moore#2348931,2|2349076,2|2349268,1|2349746,8|2349903,1|2350368,2|2350437,1 journal#2371460,2|2371490,1|2371518,2|2371524,1|2371565,3|2371591,6|2371609,2 elderly#2300000,2|2300127,1|2301060,1|2301207,1|2301873,1|2302199,1|2302733,1 bearing#2331971,1|2332125,1|2332422,1|2332610,1|2333094,1|2333854,1|2334189,1

  11. Insert Index Entries Windows Azure Storage Windows Azure Services Windows Azure Storage Worker Roles Table Storage Blob Storage Queue

  12. Insert Index Entries

  13. Windows Azure Windows Azure Storage On-Premise http://azurespeedtest.azurewebsites.net/ Blobs Tables Queues

  14. Windows Azure Windows Azure Virtual Machines Windows Azure Storage On-Premise VM VM http://azurespeedtest.azurewebsites.net/ Blobs Tables Queues

  15. ServicePointManager.DefaultConnectionLimit=100; ServicePointManager.UseNagleAlgorithm=false; ServicePointManager.Expect100Continue=false;

  16. Block Blob Operations Single HTTP request for blob Sequential HTTP requests for blocks Parallel HTTP requests for blocks Blob Upload Block Upload Block Commit

  17. Tuning Block Blob Operations SingleBlobUploadThresholdInBytes Single HTTP request for blob StreamWriteSizeInBytes Sequential HTTP requests for blocks Parallel HTTP requests for blocks ParallelOperationThreadCount Blob Upload Block Upload Block Commit

  18. Tuning Blob Operations CloudBlobClient CloudBlockBlob

  19. Parallel and Asynchronous Uploads Parallel Blobs Parallel Blocks Parallel Blobs & Blocks Files Files Files Blob Container Blob Container Blob Container Blob Blob Blob Blob Blob Blob Blob

  20. Storage Monitoring Tables • $MetricsCapacityBlob • $MetricsTransactionsBlob • $MetricsTransactionsTable • $MetricsTransactionsQueue

  21. Handling Outages • 29th February 2012 – Major due to certificate error • MVP Summit 2012 - February 28th– March 2nd • 22nd February 2013 – Storage outage due to certificate error • MVP Summit 2013 – February 18th – 22nd • MVP Summit November 2013 – November 18th – 21st • Correlation does not mean causation!

  22. Consider processing “In the Cloud” • Modify ServicePointManager Settings • Use Parallel and Asynchronous Actions • Tune CloudBlobClient and CloudBlockBlob properties • Fiddler is Your Friend (Especially the Timeline) • Use the Source (Windows Azure SDK on GitHub) • Understand Storage Emulator Limitations • Understand transient faults • Understand Pricing Implications • Leverage Storage Analytics

  23. Thanks! http://wikisearch.azurewebsites.net/

More Related