1 / 41

No SQL is not about SQL

No SQL is not about SQL. No SQL is a Zoo.. . Key-Value Stores. Wide Column Stores. SimpleDB. BigTable. Azure Table. Document Stores. Graph Databases. Why not Traditional RDBMs?. Offer incredibly useful guarantees and have been battleworn and tested. Referential Integrity.

helen
Download Presentation

No SQL is not about SQL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. No SQL is not about SQL

  2. No SQL is a Zoo.. Key-Value Stores Wide Column Stores SimpleDB BigTable Azure Table Document Stores Graph Databases

  3. Why not Traditional RDBMs? Offer incredibly useful guarantees and have been battleworn and tested.

  4. Referential Integrity

  5. ACID Transactions

  6. And SQL.. SQL is a powerful expressive DSL (Domain Specific Language) that many, many people understand.

  7. So Why No SQL?

  8. Web Scale

  9. Web scale can be done in SQL

  10. How? • Vertical Part / Logical Sharding(Instagram) • Caching (28 terabytes Facebook, 2008) • SQL + No SQL • Think about your Architect Want to learn more? Spend time on http://highscalability.com/

  11. But a reasonable question is.. How much time should we be devoting to managing scaling problems versus adding business value to these systems?

  12. So what are we giving up?

  13. Consistency MongoDB RDBMs Redis SQL Server HBase (Hadoop) MySQL CAP Oracle Google BigTable Partitiontolerant Availability Couch Cassandra Dynamo SimpleDB Voldemort

  14. FriendsWhoCook.com A social network of friends who enjoy cooking great food. • Add my Recipes • Add my friends • Show my friends • Like / Comment on my Friend’s Recipes • Search recipes of my friends, their friends, and so on by.

  15. Problem 1: Store Recipes

  16. Fairly Simple Object class Recipe { Image Photo List<Comments> Comments List<Ingredients> Ingredients List<ProfileId> Likes Category RecipeCategory }

  17. Becomes a complex RDBM’ess

  18. Object-Relational Impedance Mismatch

  19. No SQL: Document Store • Data element is a document • Documents grouped into collections • Often store in JSON • Works great with Domain Driven Design • Schema-less

  20. Document Store Examples • MongoDB (PC) • CouchDB (PA) • RavenDB (PA)

  21. DEMO: MongoDB

  22. Demo: CouchDB

  23. Problem 2: Model the Social Graph

  24. Friends in RDBMS For a more sophisticated view of modeling graphs in an RDBMs: http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age

  25. Get my Friends Declare@ProfileIDint SELECT FirstDegreeProfile.ID, FirstDegreeProfile.FirstName, FirstDegreeProfile.LastName FROM ProfileAS FirstDegreeProfile JOIN Friendship ON FirstDegreeProfile.ID=Friendship.FriendID WHERE Friendship.ProfileID=@ProfileID

  26. Friends and their friends /* Note: A much better solution would use a recursive CTE to compute transitive closure */ Declare@ProfileIDint Set@ProfileID= 1 SelectFirstDegreeFriendship.FriendIdasMyFriendId,SecondDegreeProfile.IDasSecondDegreeId,SecondDegreeProfile.FirstNameasSecondDegreeFirstName,SecondDegreeProfile.LastNameasSecondDegreeLastName fromProfile asSecondDegreeProfile JoinFriendshipasSecondDegreeFriendshipON SecondDegreeProfile.ID=SecondDegreeFriendship.FriendID joinFriendshipasFirstDegreeFriendshipON SecondDegreeFriendship.ProfileID=FirstDegreeFriendship.FriendID WhereFirstDegreeFriendship.ProfileId=@ProfileId

  27. Graph Databases • Optimized for graphs data • Check out Neo4J

  28. Problem 3: Schemaless/ Big Data Facebook's Network: Credit Traud & Frost, UNC-Chapel Hill

  29. How do we ask these questions? • After we changed the “like” button icon for half of our users, did we get more or less likes from that sample? • Of users who click on our ads, what pages did they spend the most time on? • Which hidden patterns might make us competitive that we aren’t even aware of? Want to get far ahead of the pack? Read “The Lean Startup” by Eric Ries

  30. Is this Actionable?

  31. How about this?

  32. Wide Column “A Bigtable is a sparse, distributed, persistent multidimensional sorted map” Source: http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable

  33. MapReduce Map(k,v)  [(k1, v1), (k2, v2), (k1, v3), (k3, v4)]Map(k, v) (list of intermediate key / value pairs) Internal Step: Takes list of intermediate key value pairs and converts to a key / list of values. Reduce(k, [v1,v2, v3…])  (k, n1), (k, n2)

  34. One Down Side… • We have to have smart people write MapReduce programs and the problems need to be expressible as Map Reduce.. • General solutions are BIG money.

  35. Final thought: Big Data is BIG ? =

  36. Things to Read • Bigtable: A Distributed Storage System for Structured Data • Dynamo: Amazon’s Highly Available Key-value Store • MapReduce: Simplified Data Processing on Large Clusters • The Google File System • Towards Robust Distributed Systems • http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable

  37. Creative Commons Acknowledgments and Thanks! Bobwitlox rosipaw

More Related