190 likes | 560 Views
Cassandra Database Project . News Presentation: Joab Jackson, “New Cassandra Can Pack Two Billion Columns Into a Row ” , PCWorld News, January 2011 . . Alireza Haghdoost , Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov . 17, 2011.
E N D
Cassandra Database Project News Presentation: Joab Jackson, “New Cassandra Can Pack Two Billion Columns Into a Row” , PCWorld News, January 2011. Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011
What was the Problem ? • Facebook Messages Inbox Search • Feature that enables users to search through their Facebook Inbox • Millions of messages are sent everyday on Facebook • Messages stored in different data centers • How to handle indexing all of this information for Inbox search ?
What is Cassandra ? • Distributed storage system • Designed for managing kind of NoSQL database • NoSQL: Key-Value, schema-less database • Scale to a very large size across many servers • spread across different datacenters • small and large components fail continuously • No single point of failure • Data replicated at several nodes
Cassandra Goals • High scalability • The ability to scale incrementally • High performance • The ability to respond quickly • High availability • The ability to retain data available for users
Cassandra Data Model • Cassandra does not support a full relational data model • Key-Value data model • Every row is identified by a unique key • Every row can have unlimited number of Columns • classified in different columns family • can pack Two Billioncolumns into a row • Columns are sorted in a row by • name order • time order (required for inbox search)
Distributionand Replication • Data is distributed across the nodes using Consistent Hashing function • High availability is achieved using replication • If one storage node fails, data that has been replicated in other nodes is available. • Data replicate at N node across data centersactively. • Replication policies: • Rack Unaware • Rack Aware • Datacenter Aware
Users of Cassandra System • First deployment: • 2008 by Facebook, inspired by Google and Amazon • Designed for message inbox search system • Stores TB’s of indexes across a cluster of 600+ cores and 120+ TB of disk space • Each node can handle over 5,000 requests per second • Well-known users:
References • PrashantMalik, “Inbox Search” http://ja-jp.facebook.com/blog.php?post=20387467130 • Joab Jackson, “Apache Cassandra Ready for the Enterprise” , http://www.pcworld.com/businesscenter/article/242111/apache_cassandra_ready_for_the_enterprise.html#tk.mod_rel • Joab Jackson “ , New Cassandra Can Pack Two Billion Columns Into a Row http://www.pcworld.com/businesscenter/article/216766/new_cassandra_can_pack_two_billion_columns_into_a_row.html” • AvinashLakshman and Prashant Malik. “Cassandra: a decentralized structured storage system”SIGOPS Oper. Syst. Rev. 44, 2 (April 2010) http://doi.acm.org/10.1145/1773912.1773922