1 / 25

Hortonworks HDP

HDInsight. Hortonworks HDP. Seamlessly scale in cloud Backed by Azure Storage Vault (ASV)/Azure Blob Storage. On-Premise or VM Based on HDFS. HDInsight. Hortonworks HDP. Lack of community support Untested to scale of traditional Hadoop setup in production setting

mliss
Download Presentation

Hortonworks HDP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDInsight Hortonworks HDP Seamlessly scale in cloud Backed by Azure Storage Vault (ASV)/Azure Blob Storage On-Premise or VM Based on HDFS

  2. HDInsight Hortonworks HDP Lack of community support Untested to scale of traditional Hadoop setup in production setting Lack of clear migration path to alternative Hadoop setup Reliance on MS to bake in required Hadoop tools Huge community support Can setup on multitude of Linux and Windows VM’s Migration to alternate platforms a known quantity Support for new tools such as MRv2 or YARN quickly available

  3. Reporting Tools Data Warehouse Hive ODBC ODBC Hadoop/HDFS Cassandra Azure SQL Sqoop MapReduce

  4. Problem… Hadoop is great for batch of processing of millions of records. What about real-time processing?

  5. Azure Worker Roles Data Warehouse Azure Queue Trustev API

  6. Message routing… Can be complex, brittle and hard to scale

  7. Azure Queue Azure Queue Azure Queue

  8. Message routing… Routing must be re-configured when scaling out

  9. Azure Queue Azure Queue Azure Queue Azure Queue

  10. And… Definition of fraud detection algorithms, weightings, rules get trapped in a release cycle. Fraud moves too fast!!!

  11. Enter… Apache Storm. Doing for real-time data what Hadoop did for batch processing.

  12. Shared Algorithms ML Generated Algorithms Data Warehouse Storm Cluster Azure Queue Trustev API

  13. Ordered List of Elements Name list of values of any type Unbounded sequence of tuples Can come from multiple source, like Twitter API or bolts Source of stream Can talk with queues, logs, API calls, event data Process Tuples, Create New Streams Apply functions, transforms, filter, aggregate, join and access DB’s and API’s etc. Tuples Streams Spout Bolts

  14. Are a directed graph of Spouts and Bolts. Using the correct tools, topologies can be created by fraud analysts, conversion analysts and most importantly automatically created and published using machine learning Stormtopologies

  15. Merchant A has a fraud problem that needs solving quickly. Merchant A can use our Shared Algorithm topology to immediately block common fraud problems. Data Warehouse

  16. Merchant A has been on our system for an extended period of time, and our system knows better what their fraud problem actually looks like. Our ML systems create a new topology to better deal with Merchant A’s fraud problem. Data Warehouse

  17. Hadoop • Storm Batch processing system than can churn huge volume of data Real-time complex event processing system then can process data stream +

  18. This gives us our Lambda Architecture. Real Time Big Data = Storm Process + Hadoop Process Use the history data produce by Hadoop to make the to make your real time result faster, and more accurate Speed Layer Serving Layer Batch Layer Only New Data Compensates for high latency ‘Serving Layer’ updates ‘Batch Layer’ overrides ‘Speed Layer’ Loads and expose the batch views for querying Random access to batch views Immutable, constantly growing datasets Batch views computed from this raw dataset

  19. You can build this out in hours! A simple combination of Azure Queues, SQL Azure, Azure VM’s running Cassandra, Hadoop and Storm

More Related