470 likes | 583 Views
Apache. Gary Dusbabek Rackspace. Silicon Valley Cloud Computing Group • 17 June 2010. Outline. History Scaling Replication Model Data Model Tuning Write Path Read Path Client Access Practical Considerations. Outline. History Scaling Replication Model Data Model Tuning
E N D
Apache Gary Dusbabek Rackspace Silicon Valley Cloud Computing Group • 17 June 2010
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Why Cassandra? 1.98 billion 500 GB drives 988EB 6 fold growth In 4 years 322 million 500GB drives 161 EB 2006 2010 Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
SQL • Specialized data structures (think B-trees) • Shines with complicated queries • Focus on fast query & analysis quickly • Not necessarily on large datasets
Ever tried scaling a RDBMS • For reads? • Memcache etc. • For writes? • Oh noes!
Vertical Scaling Is hard credit: janetmck via flickr
No, really: Vertical Scaling Is hard
Enter Cassandra • Amazon Dynamo • Consistent hashing • Partitioning • Replication • One-hop routing • Google BigTable • Column Families • Memtables • SSTables
Origins Pre-2008
Moving Along 2008
Landed 2009
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Distributed and Scalable • Horizontal! • All nodes are identical • No master or SPOF • Adding is simple • Automatic cluster maintenance
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Replication • Replication factor • How many nodes data is replicated on • Consistency level • Zero, One, Quorum, All • Sync or async for writes • Reliability of reads • Read repair
Ring Topology RF=3 Conceptual Ring One token per node Multiple ranges per node a j d g
Ring Topology RF=2 Conceptual Ring One token per node Multiple ranges per node a j d g
New Node RF=3 Token assignment Range adjustment Bootstrap Arrival only affects immediate neighbors a m j d g
Ring Partition RF=3 Node dies Available? Hinting Handoff Achtung! Plan for this a j d g
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Schema-free Sparse-table • Flexible column naming • You define the sort order • Not required to have a specific column just because another row does
Data Model • Keyspace • ColumnFamily • Row (indexed) • Key • Columns • Name (sorted) • Value
Data Model A single column
Data Model A single row
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Eventually Consistent • CAP Theorem • Consistency • Availability • Partition Tolerance • Choose two • Cassandra chooses A and P But…
Eventually Consistent I got a fever! And the only prescription is MORE CONSISTENCY!
Tunable Consistency • Give up a little A and P to get more C • Ratchet up the consistency level • R + W > N Strong consistency • More to come
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Inserting: Overview • Simple: put(key, col, value) • Complex: put(key, [col:value, …, col:value]) • Batch: multi key.
Inserting: Writes • Commit log for durability • Configurable fsync • Sequential writes only • Memtable – no disk access (no reads or seeks) • Sstables are final (become read only) • Indexes • Bloom filter • Raw data • Bottom line: FAST!!!
Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations
Querying: Overview • You need a key or keys: • Single: key=‘a’ • Range: key=‘a’ through ’f’ • And columns to retrieve: • Slice: cols={bar through kite} • By name: key=‘b’ cols={bar, cat, llama} • Nothing like SQL “WHERE col=‘faz’” • But secondary indices are being worked on (see CASSANDRA-749)
Querying: Reads • Practically lock free • Sstable proliferation • New in 0.6: • Row cache (avoid sstable lookup, not write-through) • Key cache (avoid index scan)
Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations
Client API (Low Level) • Fat Client • Live non-storage node • Reduced RPC overhead • Thrift (12 language bindings!) • http://incubator.apache.org/thrift/ • No streaming • Avro • Work in progress
Client API (High Level) • http://wiki.apache.org/cassandra/ClientOptions • Feature rich • Connection pooling • Load balancing/failover • Simplified APIs • Version opaque
Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations
Practical Considerations • Partitioner-Random or Order Preserving • Range queries • Provisioning • Virtual or bare metal • Cluster size • Data model • Think in terms of access • Giving up transactions, ad-hoc queries, arbitrary indexes and joins • (you may already do this with an RDBMS!)
Practical Considerations • Wide rows • Data life-span • Cluster planning • Bootstrapping
Future Direction • Vector clocks (server side conflict resolution) • Alter keyspace/column families on a live cluster • Compression • Multi-tenant features • Less memory restrictions
Wrapping Up • Use Cassandra if you want/need • High write throughput • Near-linear scalability • Automated replication/fault tolerance • Can tolerate missing RDBMS features
Questions? Linkage • wiki.apache.org/cassandra • cassandra.apache.org • gdusbabek@gmail.com • gdusbabek on twitter and just about everything else.