250 likes | 989 Views
HDInsight. Hortonworks HDP. Seamlessly scale in cloud Backed by Azure Storage Vault (ASV)/Azure Blob Storage. On-Premise or VM Based on HDFS. HDInsight. Hortonworks HDP. Lack of community support Untested to scale of traditional Hadoop setup in production setting
E N D
HDInsight Hortonworks HDP Seamlessly scale in cloud Backed by Azure Storage Vault (ASV)/Azure Blob Storage On-Premise or VM Based on HDFS
HDInsight Hortonworks HDP Lack of community support Untested to scale of traditional Hadoop setup in production setting Lack of clear migration path to alternative Hadoop setup Reliance on MS to bake in required Hadoop tools Huge community support Can setup on multitude of Linux and Windows VM’s Migration to alternate platforms a known quantity Support for new tools such as MRv2 or YARN quickly available
Reporting Tools Data Warehouse Hive ODBC ODBC Hadoop/HDFS Cassandra Azure SQL Sqoop MapReduce
Problem… Hadoop is great for batch of processing of millions of records. What about real-time processing?
Azure Worker Roles Data Warehouse Azure Queue Trustev API
Message routing… Can be complex, brittle and hard to scale
Azure Queue Azure Queue Azure Queue
Message routing… Routing must be re-configured when scaling out
Azure Queue Azure Queue Azure Queue Azure Queue
And… Definition of fraud detection algorithms, weightings, rules get trapped in a release cycle. Fraud moves too fast!!!
Enter… Apache Storm. Doing for real-time data what Hadoop did for batch processing.
Shared Algorithms ML Generated Algorithms Data Warehouse Storm Cluster Azure Queue Trustev API
Ordered List of Elements Name list of values of any type Unbounded sequence of tuples Can come from multiple source, like Twitter API or bolts Source of stream Can talk with queues, logs, API calls, event data Process Tuples, Create New Streams Apply functions, transforms, filter, aggregate, join and access DB’s and API’s etc. Tuples Streams Spout Bolts
Are a directed graph of Spouts and Bolts. Using the correct tools, topologies can be created by fraud analysts, conversion analysts and most importantly automatically created and published using machine learning Stormtopologies
Merchant A has a fraud problem that needs solving quickly. Merchant A can use our Shared Algorithm topology to immediately block common fraud problems. Data Warehouse
Merchant A has been on our system for an extended period of time, and our system knows better what their fraud problem actually looks like. Our ML systems create a new topology to better deal with Merchant A’s fraud problem. Data Warehouse
Hadoop • Storm Batch processing system than can churn huge volume of data Real-time complex event processing system then can process data stream +
This gives us our Lambda Architecture. Real Time Big Data = Storm Process + Hadoop Process Use the history data produce by Hadoop to make the to make your real time result faster, and more accurate Speed Layer Serving Layer Batch Layer Only New Data Compensates for high latency ‘Serving Layer’ updates ‘Batch Layer’ overrides ‘Speed Layer’ Loads and expose the batch views for querying Random access to batch views Immutable, constantly growing datasets Batch views computed from this raw dataset
You can build this out in hours! A simple combination of Azure Queues, SQL Azure, Azure VM’s running Cassandra, Hadoop and Storm