1 / 47

Gary Dusbabek Rackspace

Apache. Gary Dusbabek Rackspace. Silicon Valley Cloud Computing Group • 17 June 2010. Outline. History Scaling Replication Model Data Model Tuning Write Path Read Path Client Access Practical Considerations. Outline. History Scaling Replication Model Data Model Tuning

Download Presentation

Gary Dusbabek Rackspace

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Gary Dusbabek Rackspace Silicon Valley Cloud Computing Group • 17 June 2010

  2. Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

  3. Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

  4. Why Cassandra? 1.98 billion 500 GB drives 988EB 6 fold growth In 4 years 322 million 500GB drives 161 EB 2006 2010 Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf

  5. Why Cassandra?

  6. SQL • Specialized data structures (think B-trees) • Shines with complicated queries • Focus on fast query & analysis quickly • Not necessarily on large datasets

  7. Ever tried scaling a RDBMS • For reads? • Memcache etc. • For writes? • Oh noes!

  8. Vertical Scaling Is hard credit: janetmck via flickr

  9. No, really: Vertical Scaling Is hard

  10. Enter Cassandra • Amazon Dynamo • Consistent hashing • Partitioning • Replication • One-hop routing • Google BigTable • Column Families • Memtables • SSTables

  11. Origins Pre-2008

  12. Moving Along 2008

  13. Landed 2009

  14. Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

  15. Distributed and Scalable • Horizontal! • All nodes are identical • No master or SPOF • Adding is simple • Automatic cluster maintenance

  16. Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

  17. Replication • Replication factor • How many nodes data is replicated on • Consistency level • Zero, One, Quorum, All • Sync or async for writes • Reliability of reads • Read repair

  18. Ring Topology RF=3 Conceptual Ring One token per node Multiple ranges per node a j d g

  19. Ring Topology RF=2 Conceptual Ring One token per node Multiple ranges per node a j d g

  20. New Node RF=3 Token assignment Range adjustment Bootstrap Arrival only affects immediate neighbors a m j d g

  21. Ring Partition RF=3 Node dies Available? Hinting Handoff Achtung! Plan for this a j d g

  22. Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

  23. Schema-free Sparse-table • Flexible column naming • You define the sort order • Not required to have a specific column just because another row does

  24. Data Model • Keyspace • ColumnFamily • Row (indexed) • Key • Columns • Name (sorted) • Value

  25. Easier to show from the bottom up

  26. Data Model A single column

  27. Data Model A single row

  28. Data Model

  29. Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

  30. Eventually Consistent • CAP Theorem • Consistency • Availability • Partition Tolerance • Choose two • Cassandra chooses A and P But…

  31. Eventually Consistent I got a fever! And the only prescription is MORE CONSISTENCY!

  32. Tunable Consistency • Give up a little A and P to get more C • Ratchet up the consistency level • R + W > N  Strong consistency • More to come

  33. Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

  34. Inserting: Overview • Simple: put(key, col, value) • Complex: put(key, [col:value, …, col:value]) • Batch: multi key.

  35. Inserting: Writes • Commit log for durability • Configurable fsync • Sequential writes only • Memtable – no disk access (no reads or seeks) • Sstables are final (become read only) • Indexes • Bloom filter • Raw data • Bottom line: FAST!!!

  36. Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations

  37. Querying: Overview • You need a key or keys: • Single: key=‘a’ • Range: key=‘a’ through ’f’ • And columns to retrieve: • Slice: cols={bar through kite} • By name: key=‘b’ cols={bar, cat, llama} • Nothing like SQL “WHERE col=‘faz’” • But secondary indices are being worked on (see CASSANDRA-749)

  38. Querying: Reads • Practically lock free • Sstable proliferation • New in 0.6: • Row cache (avoid sstable lookup, not write-through) • Key cache (avoid index scan)

  39. Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations

  40. Client API (Low Level) • Fat Client • Live non-storage node • Reduced RPC overhead • Thrift (12 language bindings!) • http://incubator.apache.org/thrift/ • No streaming • Avro • Work in progress

  41. Client API (High Level) • http://wiki.apache.org/cassandra/ClientOptions • Feature rich • Connection pooling • Load balancing/failover • Simplified APIs • Version opaque

  42. Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations

  43. Practical Considerations • Partitioner-Random or Order Preserving • Range queries • Provisioning • Virtual or bare metal • Cluster size • Data model • Think in terms of access • Giving up transactions, ad-hoc queries, arbitrary indexes and joins • (you may already do this with an RDBMS!)

  44. Practical Considerations • Wide rows • Data life-span • Cluster planning • Bootstrapping

  45. Future Direction • Vector clocks (server side conflict resolution) • Alter keyspace/column families on a live cluster • Compression • Multi-tenant features • Less memory restrictions

  46. Wrapping Up • Use Cassandra if you want/need • High write throughput • Near-linear scalability • Automated replication/fault tolerance • Can tolerate missing RDBMS features

  47. Questions? Linkage • wiki.apache.org/cassandra • cassandra.apache.org • gdusbabek@gmail.com • gdusbabek on twitter and just about everything else.

More Related