ScaleDB: Persistence for Stream Data

ScaleDB: Persistence for Stream Data

ScaleDB: Big Fast Data w/MariaDB In-Memory SAP HANA BigQuery • High-Velocity / Disk • ScaleDB Data Velocity (Driven by Performance) Disk MariaDB, Oracle,SQL Server, etc. Disk Hadoop Data Volume (Driven by Cost – DRAM vs. Disk)

Demo • Payment Table • P.K. * FK: Account, Time, * Fields: Store, Amount, Coupon • Inserts • Lookup by Primary Key • Lookup by Account (Foreign Key) • Complex queries - BI & analytics

Demo

ScaleDB’s Solution • 1M Inserts/Second (indexed) with Simultaneous Queries • Commodity “Cloud” Instance Total: 6 Nodes, 48 cores, 0.2TB main memory • ~1M inserts/second, cost is less than $15,000 • SAP HANA (In memory DBMS) • Cluster total: 100 Nodes, 4,000 cores, 100TB of main memory • “1.5M inserts/second” (Vishal Sikka, SAP TechED) • In Memory: DRAM cost alone is ~ $2M More Than 2 Orders of Magnitude Cost Advantage

Data Volumes are Exploding Tweets per Day iPhone Downloads AWS S3 & Dropbox Data Objects …Driven by new data sources and data types Devices Social Log Files Analytics Business

Faster Insights = More Value (Complements Kinesis, Storm, etc.) Twitter Storm Response Latency 0 ms Milliseconds to minutes Later. Possibly much later Lower Value of the Data to Users/Advertisers Higher

Big Data Fast Data Twitter Storm • Real-Time Data • Ad Hoc (SQL) Processing • ScaleDB & Stream Processors • Pools of Data at Rest • Batch (programmatic) Processing • Hadoop MillWheel BigQuery

Hadoop’s Batch Processing • “…MapReducetechnologies are good at handling large volumes of data. But they are fundamentally batch-based, and struggle with enabling real-time decisions on a never-ending—and never fully complete—stream of data.” • Terry Hanold • Vice President of New Business Initiatives • Amazon AWS

Fast Data: The Car Metaphor Limited View / Real-Time Data No Historical View Historical View “Batch Lag” Real-Time Data Historical View SQL Support

DRAM Too Expensive for Stream Data • $20,000 • $200,000 • $2,000,000 • $20,000,000 • Disk • $43 • $430 • $4,300 • $43,000 Media Costs Based upon Data Volume (DRAM vs. Disk) This is why Amazon uses disk-based S3 (non-DBMS) for Kinesis • 1M inserts/second (100 byte rows), 24 hours = >8.5 TB/Day • Disk Media Cost = ~ $370 • DRAM Media Cost = ~ $172,800 (>450X more)

But Data Volumes Increase 78% CAGR According to IDC1 and Gartner2 data volumes have been measured to increase ten-fold every five years. 1. Gantz, John F. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011. Tech. An IDC White Paper 2. Paquet, Raymond. “Technology Trends You Can’t Afford to Ignore.” Lecture. Gartner Webinar. Gartner.com. Gartner Inc., Jan. 2010.

In-Memory & Big Data Data Volume Growth Dramatically Outpaces DRAM Affordability Increase Multiplier (Volume/Affordability) Increase Multiplier (Volume/Affordability) Years Years

ScaleDB: Big Fast Data w/MariaDB 1,000,000 Inserts per second In-Memory SAP HANA BigQuery • High-Velocity / Disk BigQuery Cost: $86,400/day ScaleDB Cost*: $46/day * AWS: $28 for 8.4TB storage, $18 for 6 instances of heavy usage EBS optimized • ScaleDB Data Velocity (Driven by Performance) Disk MariaDB, Oracle,SQL Server, etc. Disk Hadoop Data Volume (Driven by Cost – DRAM vs. Disk)

How it Works

Scaling the Database MariaDB DBMS Instance MariaDB MyIsam InnoDB Data • ScaleDB Storage Instance Storage • ScaleDB

Scaling the Database Tier DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager Storage Instance Storage Instance

Scaling the Storage Tier DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager Storage Instance Storage Instance Storage Instance Storage Instance Storage Instance

High-Availability DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager • MirroredVolumes Storage Instance Storage Instance Storage Instance Storage Instance Storage Instance

NoSQL v. MySQL

Push-Down: Distributed Parallel Processing Query Query Query Query Push Processing to the Data Result: High-PerformanceParallel Processing Similar to Map/Reduce Response Response Response Response MariaDB • ScaleDB ScaleDB Storage ScaleDB Storage ScaleDB Storage

Customer Success Story

Customer Success Story: Statricks Target: 300M-450M Listings per Day From: eBay, Craigslist …. • Processing: • Price trends • Listing Longevity • Spam Detection • Ad Metrics • Price Trend Time Series • Statistical Analysis

Thank You

ScaleDB: Persistence for Stream Data