1 / 22

Making your Apps Smarter with Azure HDInsight

Making your Apps Smarter with Azure HDInsight. Matt Winkler, @ mwinkle Principal Program Manager 3-529. Agenda. What is HDInsight ? Programming Hadoop Integrating with your apps. Windows Azure HDInsight. On demand Apache Hadoop Clusters. Elastic – what you want, when you want it

claus
Download Presentation

Making your Apps Smarter with Azure HDInsight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making your Apps Smarter with Azure HDInsight Matt Winkler, @mwinkle Principal Program Manager 3-529

  2. Agenda • What is HDInsight? • Programming Hadoop • Integrating with your apps

  3. Windows Azure HDInsight • On demand Apache Hadoop Clusters • Elastic – what you want, when you want it • Simple – 3 clicks, 10 minutes to Hadoop • Secure – isolated, secured by default • Big data processing on top of Azure Storage

  4. Demo: HDInsight 0 – 60

  5. Hadoop there it is • “Hadoop is a distributed system for counting words.” • scalding readme • Storage + proliferation of compute models for data processing at scale • Began life as an open source implementation of Google’s Map/Reduce and GFS papers • In use at many major web companies at massive scale (1000’s of node, PB’s of storage)

  6. Cascading Storm Map/Reduce Drill Hive Scalding Pig Scoobi Mahout Oozie Sqoop Pegasus

  7. But I don’t have big data! • “we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data. ” • Peter Norvig, et al • Cloud as global aggregation point for sensors & devices • Data born in the cloud in apps and services • In your app, what data are you not collecting? Why? What could you do with it?

  8. Acquire – Compute – Publish – Consume

  9. Query Query Shape Shape Acquire Publish Leverage Acquire Publish Experiment Experiment Azure Blob Storage Azure Blob Storage

  10. Consume Query Query Shape Shape Acquire Publish Leverage Acquire Publish Experiment Experiment Azure Blob Storage Azure Blob Storage

  11. Authoring Jobs App Integration End User Tooling (IDE’s, Analyst tools, Command lines) • Innovation flows upward • New compute models • Perf enhancements Breadth of Clients (Java, JS, .NET, etc) • Lightweight • Low cost to extend • Scenario oriented Connectivity Programmability Security Loosely coupled Extend breadth & depth Enable new scenarios Integrate with current tool chains Consistent REST API’s Core Hadoop Authoring frameworks and languages

  12. Acquire • App/Services writing to Blob • Data available via services • Push to blob (on-prem => cloud) • Copy To Blob

  13. Compute • No one tool to rule them all • Preprocessing – Cleansing / Shaping / Enriching • Traditional Analytics – Query Advanced Analytics – Experiments / Models • SDK & Command line tools for submitting and managing these jobs

  14. Publish • Back to Blobs • Into reporting engine • Into “online” store – SQL, Mongo, etc • As another table

  15. Consume Results • Data Explorer • ODBC/JDBC • REST API’s • Direct from Blob Storage • Additional Hadoop jobs

  16. What Next? • Scheduled execution • More sophisticated analysis • Analyze impact of changes • Incorporate other data sources

  17. HDInsight Status • Azure Public Preview • Available in 2 Azure Regions • Discounted compute pricing • SDK & CLI tools available

  18. HDInsight, What’s Next? • Committed SLA • Globally available • Continuing to improve Hadoop (including Stinger) • Continued Investments in Cluster Configuration, Management, Integration and Developer tooling

  19. Resources • big data on azure • hdinsight getting started • hadoopsdk for .net (includes PowerShell for now) • node.js sdk • x-plat cli

  20. Required Slide *delete this box when your slide is finalized Your MS Tag will be inserted here during the final scrub. Evaluate this session • Scan this QR codeto evaluate this session and be automatically entered in a drawing to win a prize!

More Related