700 likes | 974 Views
Cassandra Training. Introduction & Data Modeling. Aims. By the end of today you should know: How Cassandra organises data How to configure replicas How to choose between consistency and availability How to efficiently model data for both reads and writes
E N D
Cassandra Training Introduction & Data Modeling
Aims • By the end of today you should know: • How Cassandra organises data • How to configure replicas • How to choose between consistency and availability • How to efficiently model data for both reads and writes • You need to consider Active-Active scenarios • Who to ask to help you & sign off on your data model • HINT: Ask Neil directly or email harch@expedia.com. Introduction to Cassandra
Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra
Elevator Pitch What? Write path optimised Eventually consistent (ms) Distributed Hash Table Highly durable Tunable consistency Introduction to Cassandra
DHT 101 Each physical node is assigned a token Nodes own the range from the previous token Introduction to Cassandra
Cassandra Write Path The coordinator will send the update to two nodes, starting at the owning node and working clockwise Introduction to Cassandra
Cassandra Write Path 128-bit hash used to compute partition key Keys are therefore distributed randomly around the ring If Unavailable - Hinted Handoff Introduction to Cassandra
Cassandra Write Path • SSTables are sequential and immutable • Data may reside across SSTables • SSTables are periodically compacted together Introduction to Cassandra
Cassandra Read Path Data read command sent to closest replica - snitch Digest commands sent to other replicas – CL Read Repair Chance 10% - digest all replicas Introduction to Cassandra
Start & Interrogate C* • vagrant box add dse.boxhttp://htraining.s3.amazonaws.com/dse.box • mkdir ~/vagrant • curl http://htraining.s3.amazonaws.com/vagrant-dse.tar.gz > ~/vagrant/dse.tar.gz • cd ~/vagrant && tar xzvfdse.tar.gz • cd dse && vagrant up • vagrant ssh node1 • nodetool ring Introduction to Cassandra
Cassandra Read Path Read Mechanics Find Candidate SSTables - Bloom Filters Seek Through SSTables Memory Mapped Files Check Memtable -> minimisesstables for best efficiency Introduction to Cassandra
Deletion& Tombstones Deleted data marked as removed – tombstone Stops zombie data – distributed system Tombstones collected after a few days – configurable Introduction to Cassandra
Brewer’s Theorem Distributed Data – only 2 at a time – Consistency Availability Partition Tolerance Introduction to Cassandra
Brewer’s Theorem CA - normal operation, no partition, consistency and availability provided Introduction to Cassandra
Brewer’s Theorem AP - partition occurs, maintaining two mutable, disconnected state copies breaks consistency, availability is conserved Introduction to Cassandra
Brewer’s Theorem CP - partition occurs, to maintain consistency we need to take one side offline, sacrificing availability Introduction to Cassandra
Tuneable Consistency Cassandra Consistency Level Specify node number to agree on read/write Choose consistency or availability: CL.LOCAL_QUORUM, CL.ONE Eventual consistency will bring both sides into agreement eventually Introduction to Cassandra
Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra
Data Model Keyspace Analogous to Database/Schema Segregate Applications Replication configured at this level Introduction to Cassandra
Data Model Column Family Analogous to Table Contains many rows Caches configurable at this level Introduction to Cassandra
Data Model Row Each one has a partition key - hash Has many columns– up to 2Bn Columns don’t have to be defined ahead of time Rows in the same CF can have different columns No sorting by rows, model ordering in rows Introduction to Cassandra
Data Model Columns Sorted by name before being written to SSTable Name and Value are typed Values can be type-validated Column update is timestamped Can have TTL Introduction to Cassandra
Data Model Counter Columns Distributed counters Can get false counts Introduction to Cassandra
Data Model Super Columns – Don’t Use Blob of columns stored inside a single column Have to read and write whole blob Memory intensive Conflicts resolved for whole blob - bad Introduction to Cassandra
Secondary Indices Can define an index on a column Cassandra will maintain an inverted index Use sparingly Low Cardinality Columns Only Often times better to maintain own view Introduction to Cassandra
Thrift vs CQL Thrift Original interface, hash style syntax CQL SQL-like syntax but highly limited Sent over Thrift but plans for own protocol Introduction to Cassandra
Scaling Cassandra Imagine RF=3, Quorum, Nodes=6 Each query impacts 2 nodes sync Each write will touch all 3 nodes, though async To scale writes add more nodes To scale reads, add more replicas Introduction to Cassandra
Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra
Data Modelling - Concepts Rows in same CF will live on different nodes High cost of multi-get De-normalise your data into rows Don’t Put Consistent Load on Single Row Will heat up replica nodes Introduction to Cassandra
Data Modelling - Concepts Writes to Single Row Atomic & Isolated Columns are Ordered Column Range Slicing Efficient Mutating data often needs compaction tuning Introduction to Cassandra
Wide Rows Efficient Reads Store how you want to fetch Fetch most efficient over few rows Store what you want to fetch in few rows Introduction to Cassandra
Time Series Use Timestamp for Column Name – ordered Range slicing efficient Can limit row length by using date partition key e.g. 20121004 Introduction to Cassandra
Composite Columns Composite Column e.g. time1:log_class, time1:log_message, time2:log_class, time2:log_message Introduction to Cassandra
Time Series Writing to a Single Row Hotspots Use Round Robin Over Rows e.g. 20121004:1, 20121004:2, etc… Introduction to Cassandra
Compound Keys Compound Key in CQL3 Partition Key is the row key Compound Key = Partition Key + Composite Key e.g. partition key = 20121004, composite key = time1 20121004 => time1:name, time1:msg, time2:name, time2:msg Introduction to Cassandra
Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra
Working with CQL • cqlsh -3 192.168.33.21 • CREATE KEYSPACE my_app_data • WITH strategy_class = SimpleStrategy • AND strategy_options:replication_factor = 2; • DESCRIBE KEYSPACE my_app_data; Introduction to Cassandra
Compound Keys USE my_app_data; CREATE COLUMNFAMILY logs ( day text, -- partition key log_idtimeuuid, -- clustering column log_class text, log_message text, primary key (day, log_id) ); DESCRIBE columnfamilies; Introduction to Cassandra
Compound Keys INSERT INTO logs (day,log_id,log_class,log_message) VALUES (‘20130604’,‘2013-06-04 10:05:00’, ‘error’, ‘itbroke’) USING CONSISTENCY ONE; INSERT INTO logs (day,log_id,log_class,log_message) VALUES (‘20130604’, ‘2013-06-04 11:05:00’, ‘error’, ‘itbrokeagain’) USING CONSISTENCY QUORUM; Introduction to Cassandra
Compound Keys SELECT * FROM logs USING CONSISTENCY ONE WHERE day=‘20130604’; SELECT * FROM logs USING CONSISTENCY QUORUM WHERE day=‘20130604’ AND log_id > ‘2013-06-04 11:00:00’; TRY WITH CL.TWO: vagrant suspend node2 Setting CL and range querying columns, losing consistency Introduction to Cassandra
Compound Keys cassandra-cli -h 192.168.33.21 usemy_app_data; list logs; See the raw Cassandra data Introduction to Cassandra
Code Example - Clients Hector Solid Java Client In Use in Production Round Robin Node Discovery Introduction to Cassandra
Code Example - Clients Astyanax Netflix Open Source Library Simpler APIs Introduction to Cassandra
Code Example Example: Storing Payment Methods https://github.com/neilbeveridge/example-compoundkeys Introduction to Cassandra
Code Example Requirements Store 1-10 payment methods Use a single row Introduction to Cassandra
Code Example Non-CQL Define a composite column class public static final class Composite { private @Component(ordinal = 0) String paymentUuid; private @Component(ordinal = 1) String field; Introduction to Cassandra
Code Example Writing Data UUID paymentUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); String sPaymentUUID = paymentUUID.toString(); batch.withRow(PAYMENTS_CF, userId) .putColumn(new Composite(sPaymentUUID, "pvtoken"), paymentInfo.pvToken, null) .putColumn(new Composite(sPaymentUUID, "name"), paymentInfo.name, null) .putColumn(new Composite(sPaymentUUID, "number"), paymentInfo.number, null) Introduction to Cassandra
Code Example Reading Data Need some logic to handle record boundaries //handle the payment info boundary if (lastSeen != null && !column.getName().getPaymentUuid().equals(lastSeen)) { payments.add(payment); payment = new PaymentInfo(); payment.paymentUUID= UUID.fromString(column.getName().paymentUuid); } lastSeen= column.getName().getPaymentUuid(); Introduction to Cassandra
Code Example A Bit Messy Introduction to Cassandra
Code Example CQL3 Need to define a Schema Cassandra needs it to split up the row for us Introduction to Cassandra