280 likes | 460 Views
NoSQL with MySQL. Yekesa Kosuru Distinguished Architect, Nokia Peter Zaitsev, Percona. Agenda. Part 1 : Problem statement Part 2 : Background (What is, Why, ACID, CAP) Part 3 : Building a Key-Value Store Part 4 : NoSQL with MySQL. Part 1: Problem statement. NoSQL : No Formal Definition.
E N D
NoSQL with MySQL Yekesa Kosuru Distinguished Architect, Nokia Peter Zaitsev, Percona
Agenda • Part 1 : Problem statement • Part 2 : Background (What is, Why, ACID, CAP) • Part 3 : Building a Key-Value Store • Part 4 : NoSQL with MySQL
NoSQL : No Formal Definition • No Formal Definition • Not Only SQL ? • Don’t use SQL ? • Limit expressive power ? • Underlying issues ? • ACID (Strong Consistency ?, Isolation ?) • SQL • Something else ? • Facts • Hardware will fail • Software will have bugs • Growing traffic, data volumes, unpredictable • No downtime or expose errors to users
Problem to solve • Key Value Store Problem Statement • A consumer facing system with foll. characteristics : • Host large volumes of data & queries • Responsive with predictable performance • Always available (survives failures, software/hardware upgrades) • Low cost of ownership • Minimal administration • Availability : How to service customers (24 X 7) despite failures, upgrades • Scalability : How to scale (capacity, transactions, costs…) • Predictability : Predictable responsiveness • Not suited for all applications
ACID • Transactions simplify job of developers • Client/developer point of view • Atomicity: Bundled operations • All of the operations (multiple updates, deletes, inserts) in the transaction will complete, or none will • Consistency: The database is in a consistent state when the transaction begins and ends • Referential integrity, and other constraints • All reads must see latest data • Isolation: The transaction will behave as if it is the only operation being performed upon the database. Serialized updates i.e locks • Durability: Upon completion of the transaction, the operation will not be reversed. • ACID in single database is efficient
RequestCommit Prepare Prepare Prepare Prepare Prepare Prepare Prepare Commit Prepare Prepare Prepare Prepared Issues Scaling DBMS • Databases extend ACID to span multiple nodes using 2PC • Shared disk & shared nothing architectures • Cost of 2PC (N nodes involved in transaction) • Consistency & Isolation over 2PC makes it expensive • 3N+1 Message Complexity • N+1 Stable Writes • Locks – lock resources • Availability is a pain point • 2PC is blocking protocol , uses single coordinator and not fault tolerant • Coordinator failure can block transaction indefinitely • Blocked transaction causes increased contention • Rebalancing • Upgrades • Total Cost of Ownership
CAP • Introduced by Prof. Brewer in year 2000 • Consistency, Availability, (Network) Partition Tolerance • Industry proved that we can guarantee two out of three • System point of view • Consistency: Defines rules for the apparent order and visibility of updates • Variations on consistency • Eventual Consistency - eventually replicas will be updated – order of updates may vary - all reads will eventually see it • Availability : All clients can read & write data to some replica, even in the presence of failures • Partition-tolerance : operations will complete, even if network is unavailable i.e disconnected network
Important Considerations • Functional • Simple API ( Get, Put, Delete - like Hashtable) • Primary key lookups only (limit expressive power) • No joins, No non-key lookups, No grouping functions, No constraints … • Data Model (Key-Value, Value = JSON) • Read-Your-Write Consistency • SLA • Highly Available & Horizontally Scalable • Performance (low latencies & predictable) • Uniform cost of queries • Operational • Symmetric, Rolling Upgrades, Backups, Alerts, Monitoring • Total Cost of Ownership (TCO) • Hardware, Software, Licenses …
Making it Available • Make N replicas of data • Use Quorum (R + W > N) • Overlap R & W to make it strongly consistent • Handle Faults (partition tolerance and self heal) • Real faults • Machine down, NIC down … • Eventual Consistency (data) • Prefer A & P in CAP • If a node is down during write, read repair later – let write proceed • Intermittent faults • X and Y are clients of Z; X can talk to Z but Y can’t • Server does not respond within timeout, stress scenarios … • Inconsistent cluster view can cause inconsistencies in data • If two concurrently writing clients have different views of cluster, resolve inconsistency later - let write proceed N = Number of replicas R = Number of Reads W = Number of Writes R + W > N 2 + 2 > 3
Making it Scalable • Sharding (or partition data) works well for Capacity • Large number of logical shards distributed over few physical machines • How to shard • Consistent Hashing • Hash(key) say “2”, move clockwise to find nearest node “B” • Hash to create a uniform distribution • Allows for adding of machines • Without redesign of hashing • Without massive movement of data • Load balancing • If a machine can’t handle load, move shards to another machine A,B,C = nodes 1,2,3,4 = keys
Vector Clocks • Determine history of a key-value, think data versioning • Need to interpret causality of events, to identify missed updates & detect inconsistencies • Timestamps are one solution, but rely on synchronized clocks and don't capture causality • Vector clocks are an alternative method of capturing order in a distributed system • Node (i) coordinating the write generates a clock at the time of write • Construct: ( nodeid : logical sequence) • Node A receives first insert : A:1 • Node B receives an update : A:2 • Clock: (A:1) is predecessor of (A:1, B:1) • Clock: (A:2) and (A:1, B:1) are concurrent • Reads use vector clocks to detect missed updates & inconsistencies
Server Node C Server Node B Client Server Node A Eventual Consistency Example 3/2/2 3 replicas Application PUT V1 A:1 Write V1 A:1 Write V1 A:1 Write PUT NODE 1 DOWN PUT Write V2 (A:1,B:1) V2 (A:1,B:1) PUT NODE 1 UP Outage Upgrade GET Read V1 A:1 Detect Missing Update Read V2 (A:1,B:1) V2 (A:1,B:1) Read GET V2 (A:1,B:1) Read Repair V2 (A:1,B:1) Missed update added
NoSQL with MySQL • Key-Value store implemented on top of MySQL/ InnoDB • MySQL serves as persistence layer, Java code handles the rest • Atomicity limited to one operation (auto commit), Consistency & Isolation limited to one machine (recall issues with 2PC) • Why MySQL ? • Easy to install, tune and restore • Fast, Reliable, Proven • Performance : • MVCC, row level locking • Buffer pool, insert buffering, direct IO • Data Protection & Integrity : • Automatic crash recovery • Backup, Recovery (full and incremental) • Leverage expertise across ACID and BASE systems
Implementation • InnoDB Engine (innodb_file_per_table) • Simple schema • Auto increment PK • App Key, Value • Metadata • Hash of app key (index) • Clustered Index and Secondary Index • KV use simple queries – no joins or grouping functions • Database is a shard (pick right number of shards) • Backup & Recovery
Tuning • Queries are so simple that tuning is focused on IOPS, Memory, TCP, Sockets, Timeouts, Connection Pool, JDBC, JVM , Queuing etc • Memory to disk ratio • Connector/JuseServerPrepStmts=true useLocalSessionState = truecachePrepStmts = truecacheResultSetMetadata = truetcpKeepAlive=truetcpTrafficClass=24autoReconnect=truemaxReconnects=...allowMultiQueries=trueprepStmtCacheSize=…blobSendChunkSize=…largeRowSizeThreshold=…locatorFetchBufferSize=… • CentOS 5.5, XFS File System
Scalability Test (TPS/IOPS) EC2’s Characteristics m1.xlarge image User Dataset = 256GB (240M 1K values) Machines = 5 Spindles=4 per box XFS File System Buffer Pool=12GB/machine Memory:Disk=1:4 Workload R/W=100/0 Random Access Pattern N/R/W=1/1/1 Incremental load to 100 threads T P S Client Threads
Performance Test Characteristics User Dataset = 900GB Machines = 9 Spindles=8 (15K)/machine XFS File System (RAID0) Buffer Pool=16GB/machine Memory:Disk=1:6 Cache=16% Workload R/W=100/0 Recency Skew i.e Recent Access=80/20 N/R/W=3/2/2 Incremental load to 10K (TPS) S E C S TPS
Tips By Peter Zaitsev Percona
MySQL for NoSQL • Tune for Simple Queries • Memory is the most important • innodb_buffer_pool_size • Are we looking at intensive writes ? • innodb_log_file_size • RAID with BBU • XFS + O_DIRECT (avoid per inode locking) • Restrict Concurrency on database • Connection pool size or innodb_thread_concurrency
More on Scalability • Is contention problem • Use MySQL 5.5 or Percona Server • Tune innodb_buffer_pool_instances • Index update contention • Use Multiple Tables • Or MySQL Partitions if you can't • Eliminate Query Parsing Overhead • Consider HandlerSocket • NoSQL Interface to InnoDB Storage Engine
Work on Result Stability • Ensure you're getting stable results • InnoDB Flushing can cause non uniform performance
THANK YOU Yekesa.kosuru@nokia.com