250 likes | 504 Views
ScaleDB: Persistence for Stream Data. ScaleDB: Big Fast Data w/ MariaDB. In-Memory SAP HANA BigQuery. High-Velocity / Disk. ScaleDB. Data Velocity (Driven by Performance). Disk MariaDB , Oracle, SQL Server, etc. Disk Hadoop. Data Volume (Driven by Cost – DRAM vs. Disk). Demo.
E N D
ScaleDB: Big Fast Data w/MariaDB In-Memory SAP HANA BigQuery • High-Velocity / Disk • ScaleDB Data Velocity (Driven by Performance) Disk MariaDB, Oracle,SQL Server, etc. Disk Hadoop Data Volume (Driven by Cost – DRAM vs. Disk)
Demo • Payment Table • P.K. * FK: Account, Time, * Fields: Store, Amount, Coupon • Inserts • Lookup by Primary Key • Lookup by Account (Foreign Key) • Complex queries - BI & analytics
ScaleDB’s Solution • 1M Inserts/Second (indexed) with Simultaneous Queries • Commodity “Cloud” Instance Total: 6 Nodes, 48 cores, 0.2TB main memory • ~1M inserts/second, cost is less than $15,000 • SAP HANA (In memory DBMS) • Cluster total: 100 Nodes, 4,000 cores, 100TB of main memory • “1.5M inserts/second” (Vishal Sikka, SAP TechED) • In Memory: DRAM cost alone is ~ $2M More Than 2 Orders of Magnitude Cost Advantage
Data Volumes are Exploding Tweets per Day iPhone Downloads AWS S3 & Dropbox Data Objects …Driven by new data sources and data types Devices Social Log Files Analytics Business
Faster Insights = More Value (Complements Kinesis, Storm, etc.) Twitter Storm Response Latency 0 ms Milliseconds to minutes Later. Possibly much later Lower Value of the Data to Users/Advertisers Higher
Big Data Fast Data Twitter Storm • Real-Time Data • Ad Hoc (SQL) Processing • ScaleDB & Stream Processors • Pools of Data at Rest • Batch (programmatic) Processing • Hadoop MillWheel BigQuery
Hadoop’s Batch Processing • “…MapReducetechnologies are good at handling large volumes of data. But they are fundamentally batch-based, and struggle with enabling real-time decisions on a never-ending—and never fully complete—stream of data.” • Terry Hanold • Vice President of New Business Initiatives • Amazon AWS
Fast Data: The Car Metaphor Limited View / Real-Time Data No Historical View Historical View “Batch Lag” Real-Time Data Historical View SQL Support
DRAM Too Expensive for Stream Data • $20,000 • $200,000 • $2,000,000 • $20,000,000 • Disk • $43 • $430 • $4,300 • $43,000 Media Costs Based upon Data Volume (DRAM vs. Disk) This is why Amazon uses disk-based S3 (non-DBMS) for Kinesis • 1M inserts/second (100 byte rows), 24 hours = >8.5 TB/Day • Disk Media Cost = ~ $370 • DRAM Media Cost = ~ $172,800 (>450X more)
But Data Volumes Increase 78% CAGR According to IDC1 and Gartner2 data volumes have been measured to increase ten-fold every five years. 1. Gantz, John F. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011. Tech. An IDC White Paper 2. Paquet, Raymond. “Technology Trends You Can’t Afford to Ignore.” Lecture. Gartner Webinar. Gartner.com. Gartner Inc., Jan. 2010.
In-Memory & Big Data Data Volume Growth Dramatically Outpaces DRAM Affordability Increase Multiplier (Volume/Affordability) Increase Multiplier (Volume/Affordability) Years Years
ScaleDB: Big Fast Data w/MariaDB 1,000,000 Inserts per second In-Memory SAP HANA BigQuery • High-Velocity / Disk BigQuery Cost: $86,400/day ScaleDB Cost*: $46/day * AWS: $28 for 8.4TB storage, $18 for 6 instances of heavy usage EBS optimized • ScaleDB Data Velocity (Driven by Performance) Disk MariaDB, Oracle,SQL Server, etc. Disk Hadoop Data Volume (Driven by Cost – DRAM vs. Disk)
Scaling the Database MariaDB DBMS Instance MariaDB MyIsam InnoDB Data • ScaleDB Storage Instance Storage • ScaleDB
Scaling the Database Tier DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager Storage Instance Storage Instance
Scaling the Storage Tier DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager Storage Instance Storage Instance Storage Instance Storage Instance Storage Instance
High-Availability DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager • MirroredVolumes Storage Instance Storage Instance Storage Instance Storage Instance Storage Instance
Push-Down: Distributed Parallel Processing Query Query Query Query Push Processing to the Data Result: High-PerformanceParallel Processing Similar to Map/Reduce Response Response Response Response MariaDB • ScaleDB ScaleDB Storage ScaleDB Storage ScaleDB Storage
Customer Success Story: Statricks Target: 300M-450M Listings per Day From: eBay, Craigslist …. • Processing: • Price trends • Listing Longevity • Spam Detection • Ad Metrics • Price Trend Time Series • Statistical Analysis