1 / 39

Modeling in Cassandra (C*)

Phoenix Cassandra Users - July 2013. Modeling in Cassandra (C*). All examples using C* 1.2.x and CQL3. Agenda. Me A bit about our use case Data Model Modeling Closing thoughts Q&A References. Me.

ozzy
Download Presentation

Modeling in Cassandra (C*)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phoenix Cassandra Users - July 2013 Modeling in Cassandra (C*) • All examples using C* 1.2.x and CQL3

  2. Agenda • Me • A bit about our use case • Data Model • Modeling • Closing thoughts • Q&A • References

  3. Me • 15+ years developing enterprise software applications for Supply Chain, Finance, and Human Resources • Interested in Big Data technologies, Agile methodologies, Math, Robotics, Music, Cycling • LinkedIn: http://linkd.in/18rG6oz

  4. A bit about our use case • Currently ingesting ~70 million records a day and growing as we get more customers • Using C* 1.1 (DataStax 2.2) and using CQL3 beta; planning to move to 1.2 • Need to delete data without impacting the system • Data size is 1TB for ~30 days worth of customer data • One use case needs to produce customer reports within 2 hours after the data has been ingested • Currently looking back 15 days, but need to be able to analyze and lookup data up to 180 days • Some use cases require up to 13 months

  5. Data Model

  6. Data Model

  7. Data Model Don’t use this to design your model in C*!

  8. Data Model • Map of a map • Map<RowKey, SortedMap<ColumnKey, ColumnValue>>

  9. Data Model • Benefits: • Efficient key lookup • Efficient scans due to natural sort of the columns • Max number of cells (rows x columns) in a single partition is 2 billion

  10. Modeling

  11. Modeling • ... but isn’t this NoSQL?

  12. Modeling • Business requirements • Data structure

  13. Modeling • Relational model for “library” system

  14. Modeling • what users are reading this book?

  15. Modeling • who is reading this book?

  16. Modeling • C* model looks similar CREATE TABLE books ( ISBN text, title text, pub_date varchar, PRIMARY KEY (ISBN)); CREATE TABLE users ( userid text, fname text, lname text, PRIMARY KEY (userid));

  17. Modeling • But ... there are no joins in C*!!

  18. Modeling • However, all the users for a book are in a single row! • Sequential reads Wide row! CREATE TABLE users_by_book ( ISBN text, userid text, checkout_dt timeuuid, PRIMARY KEY (ISBN, userid));

  19. Modeling • But you just said there are no joins in C*! • De-normalization is the way, depending on the use case • C* is great for writes CREATE TABLE users_by_book ( ISBN text, userid text, fname text, lname text, title text, checkout_dt timeuuid, PRIMARY KEY (ISBN, userid));

  20. Modeling • We need to know the books a user is reading CREATE TABLE books_by_user ( ISBN text, userid text, fname text, lname text, title text, checkout_dt timeuuid, PRIMARY KEY (userid, ISBN));

  21. Modeling • what if I want the last 5 books a user has recently checked out?

  22. Modeling • what if I want the last 5 books a user has recently checked out? CREATE TABLE books_by_user ( ISBN text, userid text, fname text, lname text, title text, checkout_dt timestamp, PRIMARY KEY (userid, checkout_dt, ISBN)) WITH CLUSTERING ORDER BY (checkout_dt DESC); Sort columns in reverse order

  23. Modeling • what if I want the last 5 books a user has recently checked out? CREATE TABLE books_by_user ( ISBN text, userid text, fname text, lname text, title text, checkout_dt timestamp, PRIMARY KEY (userid, checkout_dt, ISBN)) WITH CLUSTERING ORDER BY (checkout_dt DESC); Sort columns in reverse order select userid, ISBN, title from books_by_user where userid = ‘Ivanovich’ LIMIT 5;

  24. Modeling ... so far • Don’t think relational tables • Start with entities and relationships, if possible • Use query patterns • Duplicate and de-normalize data as needed

  25. Modeling ... a bit more • We want to know books checked out by branch and date

  26. Modeling ... a bit more • We want to know books checked out by branch and date CREATE TABLE books_by_branch_date ( ISBN text, branch text, checkout_date varchar, title text, PRIMARY KEY (branch, checkout_date, ISBN));

  27. Modeling ... a bit more • We want to know books checked out by branch and date CREATE TABLE books_by_branch_date ( ISBN text, branch text, checkout_date varchar, title text, PRIMARY KEY (branch, checkout_date, ISBN)); Wait! This will cluster all the books for all dates in a single row for the branch!

  28. Modeling ... a bit more • We want to know books checked out by branch and date CREATE TABLE books_by_branch_date ( ISBN text, branch text, checkout_date varchar, title text, PRIMARY KEY ((branch, checkout_date), ISBN));

  29. Modeling ... a bit more • We want to know books checked out by branch and date CREATE TABLE books_by_branch_date ( ISBN text, branch text, checkout_date varchar, title text, PRIMARY KEY ((branch, checkout_date), ISBN)); Composite row key!

  30. Modeling ... a bit more • Let’s add some tags to our books ... CREATE TABLE books ( ISBN text, title text, pub_date varchar, tags set<text>, PRIMARY KEY (ISBN));

  31. Modeling ... a bit more • Let’s add some tags to our books ... CREATE TABLE books ( ISBN text, title text, pub_date varchar, tags set<text>, PRIMARY KEY (ISBN)); Collections! Set, List, and Map

  32. Modeling ... a bit more • Let’s add some tags to our books ... CREATE TABLE books ( ISBN text, title text, pub_date varchar, tags set<text>, PRIMARY KEY (ISBN)); Collections! Set, List, and Map cqlsh:test> select * from books; isbn | pub_date | tags | title---------------------+-------------+-------------------+------------------ 1-123456-89762 | 09011963 | {Heroes, Marvel} | Avengers 101

  33. Modeling ... a bit more • Purge data older than 30 days in our book by branch and date tracking

  34. Modeling ... a bit more • Purge data older than 30 days in our book by branch and date tracking • Try to delete data by date? Hmm ...

  35. Modeling ... a bit more • Purge data older than 30 days in our book by branch and date tracking • TTL to the rescue!

  36. Modeling ... a bit more • Purge data older than 30 days in our book by branch and date tracking • TTL to the rescue! insert into books_by_branch_date( isbn, branch, checkout_date, title) values ('1-123456-89762', 'Mustang','07222013', 'Avengers 101') USING TTL 2592000; // 30 days in seconds

  37. Closing thoughts • Think about query patterns and data usages • Try out things and see what works for your use case • Use UUID (Universal Unique ID) - 128 bit number for surrogate keys • Use natural keys with caution • Use latest C* drivers (Java, Python, C#) • Learn THY C*!! • Don’t be shy and ask the community when in doubt

  38. Q & A

  39. References • http://wiki.apache.org/cassandra/DataModel • http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/ • http://www.datastax.com/documentation/cassandra/1.2/

More Related