270 likes | 487 Views
Microsoft Big Data Essentials Module 1 - Introduction to Big Data. Saptak Sen, Microsoft Bill Ramos, Advaiya. Agenda. Why Big Data? Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service. The Business Imperative. 1. 2. 3. 4. Human Fault Tolerance.
E N D
Microsoft Big Data EssentialsModule 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya
Agenda • Why Big Data? • Big Data Lambda Architecture • Getting started with Windows Azure HDInsight Service
The Business Imperative • 1. • 2. 3. 4. • Human Fault Tolerance • Minimize CapEx Hyper Scale on Demand Low Learning Curve
CAP Theorem C Consistency A P Availability Partition Tolerance
Big Data Lambda Architecture Batch layer Stores master dataset Compute arbitrary views Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer Batch Layer Speed Layer Serving Layer
The Batch Layer Stores master dataset (in append mode) Unrestrained computation Horizontally scalable High latency Batch views Master dataset Incoming data streams
The Speed Layer Stream processing of data Stores a limited window of data Dynamic computation Real-time views Incoming data streams Process stream Increment views Real-time increments
The Serving Layer Queries the batch and real-time views Merges the results Batch views Output Querying and merging Real-time views
Microsoft Lambda Architecture Support Batch Layer Speed Layer Serving Layer • Federations in Windows Azure SQL Database • Azure tables • Memcached/MongoDB • SQL Server database engine • SQL Server VM: • Columnstore indexes • Analysis Services • StreamInsight Azure Storage Explorer Microsoft Excel Power Query PowerPivot Power View Power Map Reporting Services LINQ to Hive Analysis Services Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS
Yahoo! Batch Layer Speed Layer Serving Layer Staging Database SQL Server Analysis Service (SSAS) Microsoft Excel and PowerPivot Other BI Tools and Custom Applications Apache Hadoop SQL Server Connector (Hadoop Hive ODBC) SQL Server Analysis Services (SSAS Cube) Hadoop Data Third Party Database + Custom Applications Microsoft Excel & PowerPivot for Excel
Ferranti Computer Systems Batch Layer Speed Layer Serving Layer Reactive Extensions (Rx) SQL Server Database (In-Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Windows Azure HDInsight Data Feed from Smart Meters Reactive Extensions (Rx) Windows Azure HDInsight SQL Server (In-Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services
Demo 1: Setting up the Windows Azure storage account Batch Layer Speed Layer Serving Layer Azure Blob storage Azure Storage Explorer Azure Storage Explorer Windows Azure Blob storage
Account Container Blob Blob Storage Concepts Pages/ Blocks Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs http://<account>.blob.core.windows.net/<container>/<blobname> PIC01.JPG Images Block/Page Contoso PIC02.JPG Video Block/Page VID1.AVI
Demo 2: Setting up the Windows Azure HDInsight cluster Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console HDInsight Console Windows Azure HDInsight https://<ClusterName>.azurehdinsight.net/ Windows Azure Blob storage
Demo 3: Loading data into Windows Azure storage for use with HDInsight Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console HDInsight Console Windows Azure HDInsight https://<ClusterName>.azurehdinsight.net/ CSV files from local disk Windows Azure Blob storage
Easy Access to Data, Big & Small Search, Access & Shape Simplify access to public & corporate data Easily preview, shape, &format your data Key Features Power Query Windows Azure Marketplace Windows Azure HDInsight Service Parallel Data Warehouse with Polybase Combine with Unstructured Combine and refine data across multiple sources Gain insight across relational, unstructured, & semi-structured data Easily Manage & Query Common management of structured & unstructured data Query across relational DB & Hadoop with single T-SQL Query
Learn more • Getting Started with HDInsighthttp://blogs.msdn.com/b/windowsazure/archive/2013/03/19/getting-started-with-hdinsight.aspx • Azure HDInsight and Azure Storagehttp://blogs.msdn.com/b/windowsazure/archive/2013/03/21/azure-hdinsight-and-azure-storage.aspx