1 / 24

An introduction to HDInsight

An introduction to HDInsight. Edinson Medina SR PFE for Data and AI Microsoft Services. Who Am I?. Edinson Medina SR PFE Data and AI Domain Microsoft Services UK Venezuelan @ sqldixitox https://www.linkedin.com/in/edinsonmedina/. Roles in the room?. What is Big Data?.

shelor
Download Presentation

An introduction to HDInsight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An introduction to HDInsight Edinson Medina SR PFE for Data and AI Microsoft Services

  2. Who Am I? Edinson Medina SR PFE Data and AI Domain Microsoft Services UK Venezuelan @sqldixitox https://www.linkedin.com/in/edinsonmedina/

  3. Roles in the room?

  4. What is Big Data? • Data that is too large or complex for analysis in traditional relational databases • Typified by the “3 V’s”: • Volume – Huge amounts of data to process • Could be TBs, PBs or EBs • Variety – A mixture of structured and unstructured data • Structured, Semi-structured, Unstructured • Velocity – New data generated extremely frequently • Stream Processing, Real Time, Batch Sensor and IoT Processing Web server click-streams Social media sentiment analysis

  5. Batch Processing Real-Time Processing Predictive Analytics Filter, cleanse, and shape data for analysis ..110100101001.. Apply statistical algorithms for classification, regression, clustering, and prediction Capture, filter, and aggregate streams of data for low-latency querying

  6. What is Hadoop Map Reduce can Map and Reduce data • Big Data not the same as Hadoop • What is the MapReduce process? • What is HDFS? • MapReduce Engine vs Tez Engine Hadoop Cluster Head Node Worker Nodes can:1 Map:1 Reduce:1 Map:1 and:1 Reduce:1 data:1 Map:2 Reduce:2 can:1 and:1 Data:1 HDFS Map:2 Reduce:2 can:1 and:1 Data:1 Map:2 Reduce:2 can:1 and:1 Data:1 Map:2 Reduce:2 can:1 and:1 Data:1

  7. set hive.execution.engine=mr; SELECT… set hive.execution.engine=tez; SELECT… Map Map Map Map Map Map Map Map Map Map Map Map Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce

  8. What is HDInsight? • Microsoft’s Hadoop distribution • Powered by the cloud • 100% Apache Hadoop • Immersive insights

  9. Spark Hadoop ecosystem in HDInsight Streaming (Storm) Metadata (HCatalog) Graph (Pegasus) Stats processing (RHadoop) Business Intelligence (Excel, Power View, SSAS…) Active Directory (Ranger) Pipeline / workflow (Oozie) NoSQL Database (HBase) Data Integration ( ODBC / SQOOP/ REST) Scripting (Pig) Query (Hive) Machine Learning (Mahout) Distributed Processing (Map Reduce or TEZ) System Center (Future) Log file aggregation (Flume) YARN Distributed Storage (HDFS)

  10. A metadata service that projects tabular schemas over folders • Enables the contents of folders to be queried as tables, using SQL-like query semantics • Queries are translated into jobs • Execution engine can be Tez or MapReduce SELECT…

  11. Pig performs a series of transformations to data relations based on Pig Latin statements • Relations are loaded using schema on read semantics to project table structure at runtime • You can run Pig Latin statements interactively in the Grunt shell, or save a script file and run them as a batch

  12. A workflow engine for actions in a Hadoop cluster • MapReduce • Hive • Pig • Others • Support parallel workstreams and conditional branching

  13. Sqoop is a database integration service • Built on open source Hadoop technology • Enables bi-directional data transfer between Hadoop clusters and databases via JDBC

  14. A low-latency, NoSQL database built on Hadoop • Modeled on Google’s BigTable • HBase stores data in StoreFiles on HDFS HBase HDFS

  15. What is NoSQL • A type of databases • Don’t use the relational model • Good fit for distributed environments NoSQL has very little to do with SQL (structured query language), It should have been called Not Only Relational Databases Schema-less / schema-free Focus on performance over consistence

  16. What is a Stream of data? 01100101 01100101 01100101 01100101 01100101 01100101 01100101 01100101 01100101 A unbounded sequence of event data Stream processing is continuous Aggregation is based on temporal windows

  17. An event processor for data streams • Defines a streaming topology that consists of: • Spouts: Consume data sources and emit streams that contain tuples • Bolts: Operate on tuples in streams • Storm topologies run continuously on streams of data • Real-time monitoring • Event aggregation and logging Spout Bolt

  18. A fast, general purpose computation engine that supports in-memory operations • A unified stack for interactive, streaming, and predictive analysis • Can run in Hadoop clusters

  19. Solutions to Problems

  20. So, Do you need big data? • Are your data volumes truly “big”? • Many times we regulate on how much data we save • Are you collection enough? • Is it needed? • Do you required to constantly accommodate new data • Is your business transactional only • How will you benefit from it? • Are you ready for it? • You will need to filter trough the noise • Skills and expertise

  21. Demo Create Spark Cluster in Azure HDInsight Processing Big Data with Hive Connect using PowerBI desktop Predictive analysis with spark

  22. Questions?

  23. Just like Jimi Hendrix …  We love to get feedback Please complete the session feedback forms

  24. SQLBits - It's all about the community... Please visit Community Corner, we are trying this year to get more people to learn about the SQL Community, equally if you would be happy to visit the community corner we’d really appreciate it.

More Related