1 / 23

Microsoft Big Data Essentials Module 1 - Introduction to Big Data

Microsoft Big Data Essentials Module 1 - Introduction to Big Data. Saptak Sen, Microsoft Bill Ramos, Advaiya. Agenda. Why Big Data? Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service. The Business Imperative. 1. 2. 3. 4. Human Fault Tolerance.

Download Presentation

Microsoft Big Data Essentials Module 1 - Introduction to Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microsoft Big Data EssentialsModule 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya

  2. Agenda • Why Big Data? • Big Data Lambda Architecture • Getting started with Windows Azure HDInsight Service

  3. The Business Imperative • 1. • 2. 3. 4. • Human Fault Tolerance • Minimize CapEx Hyper Scale on Demand Low Learning Curve

  4. CAP Theorem C Consistency A P Availability Partition Tolerance

  5. Big Data Lambda Architecture

  6. Big Data Lambda Architecture Batch layer Stores master dataset Compute arbitrary views Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer Batch Layer Speed Layer Serving Layer

  7. The Batch Layer Stores master dataset (in append mode) Unrestrained computation Horizontally scalable High latency Batch views Master dataset Incoming data streams

  8. The Speed Layer Stream processing of data Stores a limited window of data Dynamic computation Real-time views Incoming data streams Process stream Increment views Real-time increments

  9. The Serving Layer Queries the batch and real-time views Merges the results Batch views Output Querying and merging Real-time views

  10. Microsoft Lambda Architecture Support Batch Layer Speed Layer Serving Layer • Federations in Windows Azure SQL Database • Azure tables • Memcached/MongoDB • SQL Server database engine • SQL Server VM: • Columnstore indexes • Analysis Services • StreamInsight Azure Storage Explorer Microsoft Excel Power Query PowerPivot Power View Power Map Reporting Services LINQ to Hive Analysis Services Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS

  11. Yahoo! Batch Layer Speed Layer Serving Layer Staging Database SQL Server Analysis Service (SSAS) Microsoft Excel and PowerPivot Other BI Tools and Custom Applications Apache Hadoop SQL Server Connector (Hadoop Hive ODBC) SQL Server Analysis Services (SSAS Cube) Hadoop Data Third Party Database + Custom Applications Microsoft Excel & PowerPivot for Excel

  12. Ferranti Computer Systems Batch Layer Speed Layer Serving Layer Reactive Extensions (Rx) SQL Server Database (In-Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Windows Azure HDInsight Data Feed from Smart Meters Reactive Extensions (Rx) Windows Azure HDInsight SQL Server (In-Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services

  13. Windows Azure Storage

  14. Demo 1: Setting up the Windows Azure storage account Batch Layer Speed Layer Serving Layer Azure Blob storage Azure Storage Explorer Azure Storage Explorer Windows Azure Blob storage

  15. Account Container Blob Blob Storage Concepts Pages/ Blocks Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs http://<account>.blob.core.windows.net/<container>/<blobname> PIC01.JPG Images Block/Page Contoso PIC02.JPG Video Block/Page VID1.AVI

  16. Getting started with HDInsight Service

  17. Demo 2: Setting up the Windows Azure HDInsight cluster Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console HDInsight Console Windows Azure HDInsight https://<ClusterName>.azurehdinsight.net/ Windows Azure Blob storage

  18. Demo 3: Loading data into Windows Azure storage for use with HDInsight Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console HDInsight Console Windows Azure HDInsight https://<ClusterName>.azurehdinsight.net/ CSV files from local disk Windows Azure Blob storage

  19. Easy Access to Data, Big & Small

  20. Easy Access to Data, Big & Small Search, Access & Shape Simplify access to public & corporate data Easily preview, shape, &format your data Key Features Power Query Windows Azure Marketplace Windows Azure HDInsight Service Parallel Data Warehouse with Polybase Combine with Unstructured Combine and refine data across multiple sources Gain insight across relational, unstructured, & semi-structured data Easily Manage & Query Common management of structured & unstructured data Query across relational DB & Hadoop with single T-SQL Query

  21. Learn more • Getting Started with HDInsighthttp://blogs.msdn.com/b/windowsazure/archive/2013/03/19/getting-started-with-hdinsight.aspx • Azure HDInsight and Azure Storagehttp://blogs.msdn.com/b/windowsazure/archive/2013/03/21/azure-hdinsight-and-azure-storage.aspx

  22. Questions?

More Related