1 / 58

Introduction to NoSQL Cassandra HBase

Introduction to NoSQL Cassandra HBase. Many thanks to D. Tsoumakos for the slides Material adapted from slides by : Perry Hoekstra and Gary Dusbabek ( Rackspace ) CS 525 Indranil Gupta. History of the World. Relational Databases – mainstay of business

traceyg
Download Presentation

Introduction to NoSQL Cassandra HBase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to NoSQLCassandraHBase Many thanks to D. Tsoumakos for the slides Material adapted from slides by : Perry Hoekstra andGary Dusbabek(Rackspace) CS 525 Indranil Gupta

  2. History of the World • Relational Databases – mainstay of business • Web-based applications caused spikes • Especially true for public-facing e-Commerce sites • Developers begin to front RDBMS with memcache or integrate other caching mechanisms within the application (ie. Ehcache)

  3. SQL • Specialized data structures (think B-trees) • Shines with complicated queries • Focus on fast query & analysis • Not necessarily on large datasets

  4. Scaling Up • Issues with scaling up when the dataset is just too big • RDBMS were not designed to be distributed • Began to look at multi-node database solutions • Known as ‘scaling out’ or ‘horizontal scaling’ • Different approaches include: • Master-slave • Sharding

  5. Scaling RDBMS – Master/Slave • Master-Slave • All writes are written to the master. All reads performed against the replicated slave databases • Critical reads may be incorrect as writes may not have been propagated down • Large data sets can pose problems as master needs to duplicate data to slaves

  6. Scaling RDBMS - Sharding • Partition or sharding • Scales well for both reads and writes • Not transparent, application needs to be partition-aware • Can no longer have relationships/joins across partitions • Loss of referential integrity across shards

  7. What is NoSQL? • Stands for Not Only SQL • Class of non-relational data storage systems • Usually do not require a fixed table schema nor do they use the concept of joins • All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)

  8. How did we get here? • Explosion of social media sites (Facebook, Twitter) with large data needs • Rise of cloud-based solutions such as Amazon S3 (simple storage solution) • Just as moving to dynamically-typed languages (Ruby/Groovy), a shift to dynamically-typed data with frequent schema changes • Open-source community

  9. More Programming and Less Database Design Alternative to traditional relational DBMS • Flexible schema • Quicker/cheaper to set up • Massive scalability • Relaxed consistency  higher performance & availability • No declarative query language  more programming • Relaxed consistency  fewer guarantees

  10. Challenge: Coordination • The solution to availability and scalability is to decentralize and replicate functions and data…but how do we coordinate the nodes? • data consistency • update propagation • mutual exclusion • consistent global states • group membership • group communication • event ordering • distributed consensus • quorum consensus

  11. Dynamo and BigTable • Three major papers were the seeds of the NoSQL movement • BigTable (Google) • Dynamo (Amazon) • Gossip protocol (discovery and error detection) • Distributed key-value data store • Eventual consistency • CAP Theorem

  12. CAP Theorem • Proposed by Eric Brewer (Berkeley) • Subsequently proved by Gilbert and Lynch • In a distributed system you can satisfy at most 2 out of the 3 guarantees • Consistency: all nodes have same data at any time • Availability: the system allows operations all the time • Partition-tolerance: the system continues to work in spite of network partitions

  13. CAP Theorem • Three properties of a system: consistency, availability and partitions • “You can have at most two of these three properties for any shared-data system” • To scale out, you have to partition. That leaves either consistency or availability to choose from • In almost all cases, you would choose availability over consistency

  14. CAP Theorem

  15. Consistency C Fox&Brewer “CAP Theorem”: C-A-P: choose two. Claim: every distributed system is on one side of the triangle. CP: always consistent, even in a partition, but a reachable replica may deny service without agreement of the others (e.g., quorum). CA: available, and consistent, unless there is a partition. A P AP: a reachable replica provides service even in a partition, but may be inconsistent if there is a failure. Availability Partition-resilience

  16. Availability • Traditionally, thought of as the server/process available five 9’s (99.999 %). • However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes. • Want a system that is resilient in the face of network disruption

  17. Consistency Model • A consistency model determines rules for visibility and apparent order of updates. • For example: • Row X is replicated on nodes M and N • Client A writes row X to node N • Some period of time t elapses. • Client B reads row X from node M • Does client B see the write from client A? • Consistency is a continuum with tradeoffs • For NoSQL, the answer would be: maybe • CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and partition-tolerance.

  18. Eventual Consistency • When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent • For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service • Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID • Soft state: copies of a data item may be inconsistent • Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item • Basically Available – possibilities of faults but not a fault of the whole system

  19. What kinds of NoSQL • NoSQL solutions fall into two major areas: • Key/Value or ‘the big hash table’. • Amazon S3 (Dynamo) • Voldemort • Scalaris • Schema-less which comes in multiple flavors, column-based, document-based or graph-based. • Cassandra (column-based) • CouchDB (document-based) • Neo4J (graph-based) • HBase (column-based)

  20. NoSQL Categories

  21. Categories of NoSQL databases • Key-value stores • Column NoSQL databases • Document-based • Graph database (neo4j, InfoGrid) • XML databases (myXMLDB, Tamino, Sedna)

  22. Key/Value Pros: • very fast • very scalable • simple model • able to distribute horizontally Cons: - many data structures (objects) can't be easily modeled as key value pairs

  23. Schema-Less Pros: - Schema-less data model is richer than key/value pairs • eventual consistency • many are distributed • still provide excellent performance and scalability Cons: - typically no ACID transactions or joins

  24. Common Advantages • Cheap, easy to implement (open source) • Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned • Down nodes easily replaced • No single point of failure • Easy to distribute • Don't require a schema • Can scale up and down • Relax the data consistency requirement (CAP)

  25. Typical NoSQL API • Basic API access: • get(key) -- Extract the value given a key • put(key, value) -- Create or update the value given its key • delete(key) -- Remove the key and its associated value • execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).

  26. What am I giving up? • joins • group by • order by • ACID transactions • SQL as a sometimes frustrating but still powerful query language • easy integration with other applications that support SQL

  27. Apache

  28. Cassandra • Amazon Dynamo • Consistent hashing • Partitioning • Replication • One-hop routing • Google BigTable • Column Families • Memtables • SSTables

  29. Origins Pre-2008 2008 2009

  30. Distributed and Scalable • Horizontal! • All nodes are identical • No master or Single point of Failure • Adding is simple • Automatic cluster maintenance

  31. Data Model • Within Cassandra, you will refer to data this way: • Column: smallest data element, a tuple with a name and a value

  32. Data Model A single column

  33. Data Model A single row

  34. Data Model

  35. Name Name Name Value Value Value Acme Jet Propelled Unicycle Little Giant Do-It-Yourself Rocket-Sled Kit Rocket-Powered Roller Skates toon toon toon Beep Prepared Ready, Set, Zoom Hot Rod and Reel inventoryQty inventoryQty inventoryQty 4 5 1 wheels brakes brakes false 1 false Data Model ColumnFamily: Rockets Key Value 1 name 2 name 3 name

  36. Data: Schema-free Sparse-table • Flexible column naming • You define the sort order • Not required to have a specific column just because another row does

  37. Consistent Hashing • Partition using consistent hashing • Keys hash to a point on a fixed circular space • Ring is partitioned into a set of ordered slots and servers and keys hashed over these slots • Nodes take positions on the circle. • A, B, and D exists. • B responsible for AB range. • D responsible for BD range. • A responsible for DA range. • C joins. • B, D split ranges. • C gets BC from D.

  38. Inserting: Overview • Simple: put(key, col, value) • Complex: put(key, [col:value, …, col:value]) • Batch: multi key.

  39. Inserting: Writes • Commit log for durability • Configurable fsync • Sequential writes only • Memtable – no disk access (no reads or seeks) • Sstables are final (become read only) • Indexes • Bloom filter • Raw data • Bottom line: FAST!!!

  40. Writes • Need to be lock-free and fast (no reads or disk seeks) • Client sends write to one front-end node in Cassandra cluster • Front-end = Coordinator, assigned per key • Which (via Partitioning function) sends it to all replica nodes responsible for key • Always writable: Hinted Handoff mechanism • If any replica is down, the coordinator writes to all other replicas, and keeps the write locally until down replica comes back up. • When all replicas are down, the Coordinator (front end) buffers writes (for up to a few hours). • Provides Atomicity for a given key (i.e., within a ColumnFamily) • One ring per datacenter • Per-DC coordinator elected to coordinate with other DCs • Election done via Zookeeper, which runs a Paxos variant

  41. Writes at a replica node On receiving a write • 1. log it in disk commit log • 2. Make changes to appropriate memtables • In-memory representation of multiple key-value pairs • Later, when memtable is full or old, flush to disk • Data File: An SSTable (Sorted String Table) – list of key value pairs, sorted by key • Index file: An SSTable of (key, position in data sstable) pairs • And a Bloom filter (for efficient search) – next slide • Compaction: Data udpates accumulate over time and SStables and logs need to be compacted • Merge SSTables, e.g., by merging updates for a key • Run periodically and locally at each server • Reads need to touch log and multiple SSTables • A row may be split over multiple SSTables • Reads may be slower than writes

  42. Querying: Overview • You need a key or keys: • Single: key=‘a’ • Range: key=‘a’ through ’f’ • And columns to retrieve: • Slice: cols={bar through kite} • By name: key=‘b’ cols={bar, cat, llama} • Nothing like SQL “WHERE col=‘faz’” • But secondary indices are being worked on (see CASSANDRA-749)

  43. Querying: Reads • Practically lock free • Sstable proliferation • New in 0.6: • Row cache (avoid sstable lookup, not write-through) • Key cache (avoid index scan)

  44. Deletes and Reads • Delete: don’t delete item right away • Add a tombstone to the log • Compaction will eventually remove tombstone and delete item • Read: Similar to writes, except • Coordinator can contacts a number of replicas (e.g., in same rack) specified by consistency level • Forwards read to replicas that have responded quickest in past • Returns latest timestamp value • Coordinator also fetches value from multiple replicas • check consistency in the background, initiating a read-repair if any two values are different • Brings all replicas up to date • A row may be split across multiple SSTables => reads need to touch multiple SSTables => reads slower than writes (but still fast)

  45. Bloom Filter • Compact way of representing a set of items • Checking for existence in set is cheap • Some probability of false positives: an item not in set may check true as being in set • Never false negatives Large Bit Map 0 • On insert, set all hashed bits. • On check-if-present, • return true if all hashed bits set. • False positives • False positive rate low • k=4 hash functions • 100 items • 3200 bits • FP rate = 0.02% 1 2 3 Hash1 Key-K Hash2 . . 69 Hashk 111 127

  46. Cassandra uses Quorums • Quorum = way of selecting sets so that any pair of sets intersect • E.g., any arbitrary set with at least Q=N/2 +1 nodes • Where N = total number of replicas for this key • Reads • Wait for R replicas (R specified by clients) • In the background, check for consistency of remaining N-R replicas, and initiate read repair if needed • Writes come in two default flavors • Block until quorum is reached • Async: Write to any node • R = read replica count, W = write replica count • If W+R > N and W > N/2, you have consistency, i.e., each read returns the latest written value • Reasonable: (W=1, R=N) or (W=N, R=1) or (W=Q, R=Q)

  47. Wrapping Up • Use Cassandra if you want/need • High write throughput • Near-linear scalability • Automated replication/fault tolerance • Can tolerate missing RDBMS features

  48. An Introduction to Hadoop HBase

More Related