390 likes | 588 Views
Is NoSQL the Future of Data Storage?. By Gary Short Developer Express. Introduction. Gary Short Technical Evangelist for Developer Express C# MVP gary@garyshort.org www.garyshort.org @ garyshort . Where Does NoSQL Originate?. 1998 OS relational database
E N D
Is NoSQL the Future of Data Storage? By Gary Short Developer Express
Introduction • Gary Short • Technical Evangelist for Developer Express • C# MVP • gary@garyshort.org • www.garyshort.org • @garyshort.
Where Does NoSQL Originate? • 1998 • OS relational database • Didn’t expose an SQL interface • Created by Carlo Strozzi • Said the NoSQL movement • “departs from the relational model altogether...” • “...should have been called ‘NoREL”.
More Recently... • Eric Evans reintroduced the term in 2009 • Johan Oskarsson (last.fm) • Event to discuss OS distributed databases • This labels growing number datastores • Open source • Non-relational • Distributed • (often) don’t guarantee ACID.
Atlanta 2009 • No:sql(east) conference • Billed as “conference of no-reldatastores” • Worst tag line ever • SELECT fun, profit FROM real_world WHERE rel=false.
Key Attributes of NoSQL Databases • Don’t require fixed table schemas • Non-relational • (Usually) avoid join operations • Scale horizontally • Adding more nodes to a storage system.
Document Store • Apache Jackrabbit • CouchDB • MongoDB • SimpleDB • XML Databases • MarkLogic Server • eXist.
Document What? • Okay think of a web page... • Relational model requires column/tag • Lots of empty columns • Wasted space • Document model just stores the pages as is • Saves on space • Very flexible.
Graph Storage • AllegroGraph • Core Data • Neo4j • DEX • FlockDB.
Which Means? • Graph consists of • Node (‘stations’ of the graph) • Edges (lines between them) • FlockDB • Created by the Twitter folks • Nodes = Users • Edges = Nature of relationship between nodes.
Key/Value Stores • On disk • Cache in Ram • Eventually Consistent • Weak Definition • “If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent” • Strong Definition • “for a given update and a given replica eventually either the update reaches the replica or the replica retires” • Ordered • Distributed Hash Table allows lexicographical processing.
Object Databases • Db4o • GemStone/S • InterSystemsCaché • Objectivity/DB • ZODB.
You Need Constant Consistency • You’re dealing with financial transactions • You’re dealing with medical records • You’re dealing with bonded goods • Best you use a RDMBS .
You Need Horizontal Scalability • You’re working across defined timezones • You’re Aggregating large quantities of data • Maintaining a chat server (Facebook chat) • Use NoSQL.
Up in the Clouds Baby • If you are using Azure or AWS • Compare costs of Azure Storage or SimpleDB to SQL Azure or Elastic RDBMS • Could be cheaper for your scenario.
Frequently Written Rarely Read • Think web counters and the like • Every time a user comes to a page = ctr++ • But it’s only read when the report is run • Use NoSQL (key-value storage).
I Got Big Data! • Think weather stats • Satellite Images • Maps • Use NoSQL( Something like Hadoop).
Binary Baby! • If you are YouTube • Flickr • Twitpic • Spotify • NoSQL (Amazon S3).
Here Today Gone Tomorrow • Transient data like.. • Web Sessions • Locks • Short Term Stats • Shopping cart contents • Use NoSQL (Memcache).
Data Replication • Same data in two or more locations • Music Library • Web browser • iPone App • NoSQL (CouchDB).
Hit me Baby One More Time! • High Availability • High number of important transactions • Online gambling • Pay Per view • Ahem! • Online Auction • NoSQL (Cassandra – automatic clustering).
Give me a Real World Example • Twitter • The challenges • Needs to store many graphs • Who you are following • Who’s following you • Who you receive phone notifications from etc • To deliver a tweet requires rapid paging of followers • Heavy write load as followers are added and removed • Set arithmetic for @mentions (intersection of users).
What Did They Try? • Relational Databases • Key-Value storage of denormalized lists • Did it work? • Nope! • Either good at • Handling the write load • Or paging large amounts of data • But not both .
What Did They Need? • Simplest possible thing that would work • Allow for horizontal partitioning • Allow write operations to • Arrive out of order • Or be processed more than once • Failures should result in redundant work • Not lost work!
The Result was FlockDB • Stores graph data • Not optimised for graph traversal operations • Optimised for large adjacency lists • List of all edges in a graph • Key is the edge value a set of the node end points • Optimised for fast read and write • Optimised for page-able set arithmetic.
How Does it Work? • Stores graphs as sets of edges between nodes • Data is partitioned by node • All queries can be answered by a single partition • Write operations are idempotent • Can be applied multiple times without changing the result • And commutative • Changing the order of operands doesn’t change the result.
Commutative Writes Help Bring up Partitions • Partition can receive write traffic immediately • Receive dump of data in the background • Live for read as soon as the dump is complete.
Performance? • Currently store 13 billion edges • 20K writes / second • 100K reads / second.
Lessons Learned? • Use aggressive timeouts • Cut a client loose after timeout expired • Let it try again on another app server • Use same code path for error and normal ops • Error requests are periodically retried • Instrument.
Punchline? • Under all the bells and whistles... • Its MySQL.
So is this the Future? • Yes! • And No!
Questions? • Contact me • gary@garyshort.org • @garyshort
P/X001 Understanding and Preventing SQL Injection Attacks Kevin Kline P/L001 SSIS Fieldnotes Darren Green P/L002 The (Geospatial) Shapes of Things to Come Simon Munro P/L005 End to End Master Data Management with SQL Server Master Data Services Jeremy Kashel P/T007 Understanding Microsoft Certification in SQL Server Chris Testa-O'Neill Coming up… • #SQLBITS