140 likes | 287 Views
Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Software and Services Group, Intel. Statement of faith:
E N D
Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Software and Services Group, Intel
Statement of faith: Real time (low latency) analytics will become more important to end users – if not for all queries, for a non-trivial fraction of queries.
We walk through three workload scenarios in this short presentation. Objective- Generate ideas for workloads that reflect low latency and high throughput demands simultaneously. All three use cases described here are in deployment or in pre-deployment testing among Intel partners in PRC.
Smart City Application: Detect and Prevent License Plate Fraud
CAPTURE EXTRACT STORE COMPUTE Registration and Traffic History Records RDBMS
SMART CITY Workload Solution Flow Integrate D File System Registration Records C Retrieve 3 4 Merge Evolve Detect Query 2 E Persist B F Real-time Analytics Notify Feed Extraction System Enforcement A 1 5
SMART CITY Workload Characteristics Integrate D File System Registration Records C Retrieve 3 4 • Structured and unstructured data, • Transactional and analytic activities, • Scale out in-memory processing combined with distributed persistent data stores • Real-time and batch operations, and • Information inflows from sensor and non-sensor devices Merge Evolve Detect Query 2 E Persist B F Real-time Analytics Notify Feed Extraction System Enforcement A 1 5 • Scale out in-memory processing combined with distributed persistent data stores • Structured and unstructured data • Information inflows from sensor and non-sensor devices • Transactional and analytic activities • Real-time and batch operations
Rapid Content Management -- Solution Flow Data Analysis Logic Search New Media New Media New Media New Media New Media Information Accumulation over time Information Accumulation over time Hibernate Driver HBase Driver Hive Dialect Digest and Cross Reference Hive HBase Traditional Media Traditional Media Traditional Media Traditional Media Traditional Media sparse edits Log Extract and Transform Sqoop bulk move older data RDBMS
Rapid Content Management – Workload Characteristics Data Analysis Logic Search New Media New Media New Media New Media New Media Information Accumulation over time Information Accumulation over time Hibernate Driver • Structured and unstructured data • Transactional and analytic activities • Fast searches over “hot” data, slow searches over rest • RDBMS ops mixed with HBASE HBase Driver Hive Dialect Digest and Cross Reference Hive HBase Traditional Media Traditional Media Traditional Media Traditional Media Traditional Media sparse edits Log Extract and Transform Sqoop bulk move older data RDBMS • Fast searches over “hot” data, slow searches over rest • RDBMS ops mixed with HBASE • Structured and unstructured data • Transactional and analytic activities
Telecom Payment Fraud Detection/Prevention -- Solution Flow ALERT Mid-transaction Analytics Recharge Transaction Credit Records Transactions History SELECT phone_number, SUM (charge_time), SUM (charge_amount) FROM trans_table WHERE SUM(charge_time) > threshold_1 and SUM(charge_amount) > threshold_2
Summary • Workload scenarios from several “real life” use cases • Blend of SQL and NOSQL approaches • Recent data is available for queries nearly instantaneously • Real-time responsiveness combined with high data volumes • Mix of slow and fast operations • (low latency analytics on recent data, complex analytics on historical data)