160 likes | 334 Views
The NoSQL Approach To High Scalability. Valery Pechatnikov Web Enhanced Information Management - Spring 2011. Web Scalability. The the amount of data accessible via the web has grown exponentially and continues to expand at an incredible pace
E N D
The NoSQL Approach To High Scalability • Valery Pechatnikov • Web Enhanced Information Management - Spring 2011
Web Scalability • The the amount of data accessible via the web has grown exponentially and continues to expand at an incredible pace • A typical user of the web today has a large digital footprint - expecting to be able to store and quickly retrieve personal journals, photos, and videos within the cloud
Examples: Facebook, Twitter • What 20 minutes on Facebook looks like (as of 12/31/10) • Shared links: 1,000,000 every 20 minutes • Tagged photos: 1,323,000 • Event invites sent out: 1,484,000 • Wall Posts: 1,587,000 • Status updates: 1,851,000 • Friend requests accepted: 1,972,000 • Photos uploaded: 2,716,000 • Comments: 10,208,000 • Message: 4,632,000 • Facebook Messaging (as of 11/2010): Over 350 million users sending over 15 billion person-to-person messages per month • Back in 2009, Facebook had over 30,000 servers. • Today, Facebook has decided to publish the specifications of it’s latest custom data center as it breaks ground for massive data storage (from a hardware perspective) • Twitter is currently seeing 155 million tweets per day (April 2011), and the numbers are growing weekly.
What is High Scalability? • Many Web 2.0 applications find themselves in need of scaling beyond their existing capabilities very quickly. • Scaling approaches: • Vertical - use larger capacity, more powerful, faster hardware • Horizontal - use more hardware - distribute the workload across multiple servers • Vertical scaling quickly reaches a limit and can be expensive • Horizontal scalability is cheaper, but requires distributing data across multiple servers, which can be a complex problem.
Scalability and Relational Database Management Systems (RDBMS) • Traditional horizontal scaling - data must be partitioned across multiple servers. • Possible partitioning schemes: • Master/Slave (Write/Read) • Horizontal Table partitioning • Sharding (share nothing)
RDBMS scaling problems • However, horizontal scaling with RDBMS requires giving up some of the features that make relation datastores so popular in the first place. • Often for performance reasons, application developers give up • data normalization - on order to avoid joins, store data in a redundant way • transactionality - distributed transactions are expensive • data constraints across partitioned tables
The NoSQL Approach • What is NoSQL? • The NoSQL space is not defined by any one specific database implementation, but by the notion of choice. Although these choices aren’t meant to focus exclusively on web-scale performance, in the majority of cases the need for a new choice arose due to a previous implementation’s inability to scale readily while maintaining the necessary features (and thus was really inspired by Web 2.0) • Generally, NoSQL alternatives focus on the ability to scale horizontally from the start, and thus there is a relaxation of at least some of the typical ACID properties (Atomicity, Consistency, Isolation, Durability) and often the complexity of relationships between entities is limited. • NoSQL Alternatives: • Key-Value Databases - simplest model, representing data as a massive hash table of key-value pairs. Examples: Voldemort (LinkedIn), Dynamo (Amazon) • BigTable Databases - the core entity in this implementation is a large table with column “families” and a time dimension allowing data versioning. Examples: BigTable (Google), HBase(Powerset --> Apache Hadoop). • Document databases - Data entities are represented as documents instead of rows, which are then grouped into collections (as opposed to tables). A document in a collection doesn’t need to have a defined schema (like a row would for a table). Example: MongoDB (10gen) • Graph databases - Represent arbitrary data as nodes connected by edges, providing ability to model properties and relationships. The natural query pattern for such a database is graph traversal (as opposed to joins). Example: neo4j
Evaluation Criteria • Ease of Scaling • Does the implementation take care of distributing your data automatically? • How much overheard is there in terms of performance once the data has been distributed? • Does scaling out pose a considerable maintenance burden or does the data store solution provide configurable automated maintenance with a distributed solution in mind? • ACID: • Data Consistency • Atomicity • Isolation • Durability • Query language ease of use and maturity • Data Modeling Flexibility • Migration/Adoption feasibility
Case Study: HBase • “…Hbase tables are like those in an RDBMS, only cells are versioned, rows are sorted, and columns can be added on the fly by the client as long as the column family they belong to pre-exists.” • In the example diagram below, column family groups are defined by the prefix prior to the colon (i.e. “anchor” and “contents” are sample family groups). The row has a key in the form of com.cnn.www (the convention allows rows to be sorted in such a way that data for similar domains is stored close together). The third dimension for each cell is a timestamp, such that the data is versioned. • How well does HBase scale? • Excellently. In the HBase implementation, rows are automatically grouped into “regions” and when necessary tables are split into multiple regions, which are distributed across multiple servers seamlessly. The location of a particular region is stored within a table on a special region server, such that any region can be reached with at most 3 hops. The management of region splitting, region merging, and region rebalancing all happens automatically in the HBase implementation and doesn’t need any extra coding from the developer (just configuration of the relevant thresholds)
Case Study: HBase • In the image above, the Google version of the BigTable implementation demonstrates the structure of tablet location discovery (“region” location in HBase). • A Chubby file (Zookeeper in HBase) is a file on the distributed filesystem lock service that stores the location of the first tablet (the root tablet). The metadata tablets (regions) can be found from the root tablet location. (The root and metadata tablets are all part of one table, where the topmost set of rows is the root). • Due to the implementation, this tree will never be allowed to become more than 3 levels deep • Due to the popularity of Hadoop and it’s seamless integration with HBase, many companies have taken the leap and have already adoped HBase. With Facebook as a contributor to the project there is now even more reason to believe HBase will continue to evolve and to thrive.
Case Study: MongoDB • Data entities are represented as documents instead of rows, which are then grouped into collections (as opposed to tables). A document in a collection doesn’t need to have a defined schema (like a row would for a table) • How well does MongoDB scale? • Excellently. MongoDB scales horizontally via an auto-sharding.
Case Study: Voldemort • The data model for the Voldemort database is a simple key-value map. The keys and values being stored can be almost arbitrarily complex objects. • How well does Voldemort scale? • Excellently. This is the core strength of the model and allows reads and writes to scale horizontally transparently to the user (requiring only configuration) by partitioning the data across the cluster (i.e. distributed database). • Consistent hashing is used to ensure that when a server is added or removed from the cluster, rebalancing is efficient (i.e. only necessary amount of data is rebalanced, rather than all data). • The diagram shows a sample mapping of servers A, B, C, D to the universe of possible hashes of keys. This assumes 3 servers will hold only one single key for redundancy/availability Complex and ever-changing relationships are not modeled well via a simple key-value datastore. The application reading and writing to the database would be required to check all relationship constraints and interpret arbitrary key-value compositions. There are no triggers and no constructs that allow an application to infer meaning into the data. purposes. • However, complex and ever-changing relationships are not modeled well via a simple key-value datastore. The application reading and writing to the database would be required to check all relationship constraints and interpret arbitrary key-value compositions. There are no triggers and no constructs that allow an application to infer meaning into the data.
Case Study: neo4j • Unlike the previously discussed data models, the graph database model thrives on relationships and associations. Neo4j’s data model allows a user to model nodes and relationships (edges), with attributes (key-value pairs) on both nodes and edges. • How well does Neo4j scale? • OK. Neo4j claims to be massively scalable, but at the same time concedes that it is the least scalable of the 4 types of NoSQL databases being discussed. The scalability of Neo4j is on the order of billions of nodes and relationships on a single machine. As with relational databases, it’s possible to use sharding to scale to multiple machines as well. Sharding a graph database is hard due to the nature of the graph data model.
Conclusion • Relational databases will not be made obsolete anytime soon by the NoSQL alternatives. • However, for certain special use cases the features offered by NoSQL offerings make the trade-offs worth it, and in some cases there is a clear win by going the non-traditional route. As more types of data stores become available, with varied APIs and even lower barriers to entry, the relational database will cease to be the default choice for every application. • NoSQL has already proven to be a practical alternative for the needs of Web giants such as Facebook and Twitter and should continue to expand in its scope such that it will be more than just a passing fad.
REFERENCES • Averbuch, Alex. < http://alexaverbuch.blogspot.com/2010/10/thesis-report-available.html> • Brewer E. Towards Robust Distributed Systems. PODC KeyNote speech. <http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf> • Eifrem, Emil. A NOSQL Overview And The Benefits Of Graph Databases (nosql East 2009). Digital image. Web. 08 Mar. 2011. <http://www.slideshare.net/emileifrem/nosql-east-a-nosql-overview-and-the-benefits-of-graph-databases>. • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: a Distributed Storage System for Structured Data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). USENIX Association, Berkeley, CA, USA, 205-218 • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. • Hoff, Todd. "Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages A Month." High Scalability. 16 Nov. 2010. Web. Feb.-Mar. 2011. <http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html>. • Kreps, Jay. "Project Voldemort: Scaling Simple Storage at LinkedIn." The LinkedIn Blog. LinkedIn.com, 20 Mar. 2009. Web. 07 Mar. 2011. <http://blog.linkedin.com/2009/03/20/project-voldemort-scaling-simple-storage-at-linkedin/>. • Lehnardt, Jan. "NoSQL Is About..." Blog.couchone.com. Couchone.com, 10 Apr. 2010. Web. 21 Feb. 2011. <http://blog.couchone.com/post/511008668/nosql-is-about>. • Metz, Cade. "HBase: Shops Swap MySQL for Open Source Google Mimic." Web log post. The Register: Sci/Tech News for the World. 19 Jan. 2011. Web. 07 Mar. 2011. <http://www.theregister.co.uk/2011/01/19/hbase_on_the_rise/>. • Muthukkaruppan, Kannan. "The Underlying Technology of Messages." Web log post. Facebook. 15 Nov. 2010. Web. Feb.-Mar. 2011. <http://www.facebook.com/note.php?note_id=454991608919>. • "Production Deployments." MongoDB. Web. 07 Mar. 2011. <http://www.mongodb.org/display/DOCS/Production Deployments>. • Valentin Kuznetsov, Dave Evans, Simon Metson, The CMS Data Aggregation System, Procedia Computer Science, Volume 1, Issue 1, ICCS 2010, May 2010, Pages 1535-1543, ISSN 1877-0509, DOI: 10.1016/j.procs.2010.04.172. http://www.sciencedirect.com/science/article/B9865-506HM1Y-63/2/9dc3f9e43de325f9d192714f2b464176 • Webber, Jim. “On Sharding Graph Databases.” Jim.Webber.Name, 16 Feb. 2011. <http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx> • Webber, Jim. “Scaling Neo4j with Cache Sharding and Neo4j HA.” Jim.Webber.Name, 23 Feb. 2011. < http://jim.webber.name/2011/02/23/abe72f61-27fb-4c1b-8ce1-d0db7583497b.aspx> • White, Tom. Hadoop: the Definitive Guide. Sebastopol, CA: O'Reilly, 2009. • NOSQL Databases. 15 Feb. 2011 <http://nosql-database.org/>.