Cassandra Training

Cassandra Training Introduction & Data Modeling

Aims • By the end of today you should know: • How Cassandra organises data • How to configure replicas • How to choose between consistency and availability • How to efficiently model data for both reads and writes • You need to consider Active-Active scenarios • Who to ask to help you & sign off on your data model • HINT: Ask Neil directly or email harch@expedia.com. Introduction to Cassandra

Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra

Elevator Pitch What? Write path optimised Eventually consistent (ms) Distributed Hash Table Highly durable Tunable consistency Introduction to Cassandra

DHT 101 Each physical node is assigned a token Nodes own the range from the previous token Introduction to Cassandra

Cassandra Write Path The coordinator will send the update to two nodes, starting at the owning node and working clockwise Introduction to Cassandra

Cassandra Write Path 128-bit hash used to compute partition key Keys are therefore distributed randomly around the ring If Unavailable - Hinted Handoff Introduction to Cassandra

Cassandra Write Path • SSTables are sequential and immutable • Data may reside across SSTables • SSTables are periodically compacted together Introduction to Cassandra

Cassandra Read Path Data read command sent to closest replica - snitch Digest commands sent to other replicas – CL Read Repair Chance 10% - digest all replicas Introduction to Cassandra

Start & Interrogate C* • vagrant box add dse.boxhttp://htraining.s3.amazonaws.com/dse.box • mkdir ~/vagrant • curl http://htraining.s3.amazonaws.com/vagrant-dse.tar.gz > ~/vagrant/dse.tar.gz • cd ~/vagrant && tar xzvfdse.tar.gz • cd dse && vagrant up • vagrant ssh node1 • nodetool ring Introduction to Cassandra

Cassandra Read Path Read Mechanics Find Candidate SSTables - Bloom Filters Seek Through SSTables Memory Mapped Files Check Memtable -> minimisesstables for best efficiency Introduction to Cassandra

Deletion& Tombstones Deleted data marked as removed – tombstone Stops zombie data – distributed system Tombstones collected after a few days – configurable Introduction to Cassandra

Brewer’s Theorem Distributed Data – only 2 at a time – Consistency Availability Partition Tolerance Introduction to Cassandra

Brewer’s Theorem CA - normal operation, no partition, consistency and availability provided Introduction to Cassandra

Brewer’s Theorem AP - partition occurs, maintaining two mutable, disconnected state copies breaks consistency, availability is conserved Introduction to Cassandra

Brewer’s Theorem CP - partition occurs, to maintain consistency we need to take one side offline, sacrificing availability Introduction to Cassandra

Tuneable Consistency Cassandra Consistency Level Specify node number to agree on read/write Choose consistency or availability: CL.LOCAL_QUORUM, CL.ONE Eventual consistency will bring both sides into agreement eventually Introduction to Cassandra

Data Model Keyspace Analogous to Database/Schema Segregate Applications Replication configured at this level Introduction to Cassandra

Data Model Column Family Analogous to Table Contains many rows Caches configurable at this level Introduction to Cassandra

Data Model Row Each one has a partition key - hash Has many columns– up to 2Bn Columns don’t have to be defined ahead of time Rows in the same CF can have different columns No sorting by rows, model ordering in rows Introduction to Cassandra

Data Model Columns Sorted by name before being written to SSTable Name and Value are typed Values can be type-validated Column update is timestamped Can have TTL Introduction to Cassandra

Data Model Counter Columns Distributed counters Can get false counts Introduction to Cassandra

Data Model Super Columns – Don’t Use Blob of columns stored inside a single column Have to read and write whole blob Memory intensive Conflicts resolved for whole blob - bad Introduction to Cassandra

Secondary Indices Can define an index on a column Cassandra will maintain an inverted index Use sparingly Low Cardinality Columns Only Often times better to maintain own view Introduction to Cassandra

Thrift vs CQL Thrift Original interface, hash style syntax CQL SQL-like syntax but highly limited Sent over Thrift but plans for own protocol Introduction to Cassandra

Scaling Cassandra Imagine RF=3, Quorum, Nodes=6 Each query impacts 2 nodes sync Each write will touch all 3 nodes, though async To scale writes add more nodes To scale reads, add more replicas Introduction to Cassandra

Data Modelling - Concepts Rows in same CF will live on different nodes High cost of multi-get De-normalise your data into rows Don’t Put Consistent Load on Single Row Will heat up replica nodes Introduction to Cassandra

Data Modelling - Concepts Writes to Single Row Atomic & Isolated Columns are Ordered Column Range Slicing Efficient Mutating data often needs compaction tuning Introduction to Cassandra

Wide Rows Efficient Reads Store how you want to fetch Fetch most efficient over few rows Store what you want to fetch in few rows Introduction to Cassandra

Time Series Use Timestamp for Column Name – ordered Range slicing efficient Can limit row length by using date partition key e.g. 20121004 Introduction to Cassandra

Composite Columns Composite Column e.g. time1:log_class, time1:log_message, time2:log_class, time2:log_message Introduction to Cassandra

Time Series Writing to a Single Row Hotspots Use Round Robin Over Rows e.g. 20121004:1, 20121004:2, etc… Introduction to Cassandra

Compound Keys Compound Key in CQL3 Partition Key is the row key Compound Key = Partition Key + Composite Key e.g. partition key = 20121004, composite key = time1 20121004 => time1:name, time1:msg, time2:name, time2:msg Introduction to Cassandra

Working with CQL • cqlsh -3 192.168.33.21 • CREATE KEYSPACE my_app_data • WITH strategy_class = SimpleStrategy • AND strategy_options:replication_factor = 2; • DESCRIBE KEYSPACE my_app_data; Introduction to Cassandra

Compound Keys USE my_app_data; CREATE COLUMNFAMILY logs ( day text, -- partition key log_idtimeuuid, -- clustering column log_class text, log_message text, primary key (day, log_id) ); DESCRIBE columnfamilies; Introduction to Cassandra

Compound Keys INSERT INTO logs (day,log_id,log_class,log_message) VALUES (‘20130604’,‘2013-06-04 10:05:00’, ‘error’, ‘itbroke’) USING CONSISTENCY ONE; INSERT INTO logs (day,log_id,log_class,log_message) VALUES (‘20130604’, ‘2013-06-04 11:05:00’, ‘error’, ‘itbrokeagain’) USING CONSISTENCY QUORUM; Introduction to Cassandra

Compound Keys SELECT * FROM logs USING CONSISTENCY ONE WHERE day=‘20130604’; SELECT * FROM logs USING CONSISTENCY QUORUM WHERE day=‘20130604’ AND log_id > ‘2013-06-04 11:00:00’; TRY WITH CL.TWO: vagrant suspend node2 Setting CL and range querying columns, losing consistency Introduction to Cassandra

Compound Keys cassandra-cli -h 192.168.33.21 usemy_app_data; list logs; See the raw Cassandra data Introduction to Cassandra

Code Example - Clients Hector Solid Java Client In Use in Production Round Robin Node Discovery Introduction to Cassandra

Code Example - Clients Astyanax Netflix Open Source Library Simpler APIs Introduction to Cassandra

Code Example Example: Storing Payment Methods https://github.com/neilbeveridge/example-compoundkeys Introduction to Cassandra

Code Example Requirements Store 1-10 payment methods Use a single row Introduction to Cassandra

Code Example Non-CQL Define a composite column class public static final class Composite { private @Component(ordinal = 0) String paymentUuid; private @Component(ordinal = 1) String field; Introduction to Cassandra

Code Example Writing Data UUID paymentUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); String sPaymentUUID = paymentUUID.toString(); batch.withRow(PAYMENTS_CF, userId) .putColumn(new Composite(sPaymentUUID, "pvtoken"), paymentInfo.pvToken, null) .putColumn(new Composite(sPaymentUUID, "name"), paymentInfo.name, null) .putColumn(new Composite(sPaymentUUID, "number"), paymentInfo.number, null) Introduction to Cassandra

Code Example Reading Data Need some logic to handle record boundaries //handle the payment info boundary if (lastSeen != null && !column.getName().getPaymentUuid().equals(lastSeen)) { payments.add(payment); payment = new PaymentInfo(); payment.paymentUUID= UUID.fromString(column.getName().paymentUuid); } lastSeen= column.getName().getPaymentUuid(); Introduction to Cassandra

Code Example A Bit Messy Introduction to Cassandra

Code Example CQL3 Need to define a Schema Cassandra needs it to split up the row for us Introduction to Cassandra

Cassandra Training