370 likes | 925 Views
NoSql databases. Please remember to read the NOSQL Distilled book and the Seven Databases book. Before we start. The classification of the various nosql databases is imprecise, semi-controversial, and we have to be careful about reading too much into it.
E N D
NoSql databases Please remember to read the NOSQL Distilled book and the Seven Databases book
Before we start • The classification of the various nosql databases is imprecise, semi-controversial, and we have to be careful about reading too much into it. • Rather than focusing on categorizing dbs, we should be concerned with what they do, how they relate to each other with respect to functionality, and how they compare to sql databases.
Key-value and key-document DBs • Databases that access aggregate data • Key-value dbs know nothing about the structure of the aggregate • Key-document databases do know, but the interpretation of these aggregates happens outside the db • Keep in mind that these two categories of databases overlap in practice • Importantly, both of these two database systems categories focus on storing and retrieving individual aggregates, and not on interrelating (horizontally) multiple aggregates • There is something similar to this in SQL DBs – and that is highly un-normalized tables
Important notions… • It can be a difficult problem to represent some domains as key-value or key-document databases, as the boundaries of aggregates might not be easy to determine. • This basic data modeling issue has a lot of influence on the sort of database you should use. • Relational databases don’t manipulate aggregates, but they are aggregate neutral for the most part, leaving the construction of aggregates to run time … but we might have hidden, un-normalized tables that make some commonly used aggregates much faster to materialize
Key-value vs. key-document • In key-value databases, we can only retrieve data via a key • In key-document databases, we may be able to ask questions about the content of documents – but again, we are not cross-associating them • Mongo is perhaps the most talked about key-document system, and so we will start there
Installing Mongo • Mongo • http://docs.mongodb.org/manual/installation • A GUI • http://www.mongodb.org/display/DOCS/Admin+UIs
Mongo overview • Document based • Focuses on clusters for extremely large scaling • Supports nested documents • Uses JavaScript for queries • No schema
Terminology • A database consists of collections • Collections are made up of documents • A document is made up of fields • There are also indices • There are also cursors
When to use Mongo • Medical records and other large document systems • Read heavy environments like analytics and mining • Partnered with relational databases • Relational for live data • Mongo for huge largely read only archives • Online applications • Massively wide e-commerce
Mongo documents and queries • Documents • Self-defining, with hierarchical structure • like XML • Or JSON, which uses javascript to define docs in a human-readable form • Documents can vary in structure, even in the same collection • You can add attributes to new documents in a collection without having the change the existing ones in the collection • Queries: db.order.find({“customerId”:”99”})
Consistency and transactions • There is a tailor-able consistency command that can be used the level you want for updating replicas of documents • No multi-document atomic transactions are supported • CAP theorem, which basically says there is a tradeoff between availability and consistency • You can embed references to other documents in a document, but this tends to create a “join effect” • DBRef is the command
Selectors • Used for finding, counting, updating, and removing docs from collections • {} is the null search and matches all documents • We could run: {gender:’f’} • {field1: value1, field2: value2} creates an ‘and’ operation • Also, less than, greater than, etc. (e.g., $gt) • $exists, $or
Some notes on Mongo • There are a few GUIs that seem pretty good • Mongo-vision: http://code.google.com/p/mongo-vision/ (web page) • Needs Prudence as a web server • MongoVue: http://mongovue.com, but Windows only • RockMongo (web based): http://rockmongo.com/ (web page) • Needs an apache web server • Very easy to install, just download • http://docs.mongodb.org/manual/installation
Getting an Apache web server • XAMPP for windows (mac version is way out of date) • MAMP for Macs (on the app store) • WAMP for windows (bitnami.org) • All of these give you PHP and MySQL as well. If we have time, we will look at MySQL full text search. • You might want to install PostgreSQL, too. There is a bitnami stack. If there is time, we will look at PostgreSQL UDTs and full text search.
Another document DB: CouchDB • Major focus: surviving network problems • Engineered for web use • No ad hoc querying, searching is via map reduce-based indices • We will get back to CouchDB
Map Reduce • Focus is on performing data operations on parallel hardware • This is a paradigm, not a specific programmatic technique • Each map reduce process has two phases • Convert a list into a desired sort of list with the map operator • Convert the new list into a small number of atomic values via a reduce operator • This allows us to spread an process across a wide array of servers, with each server performing an independent map reduce process
Map reduce example, from Seven DBs • Map phase: go through a list of items and find all that are related to Canada, and turning them to 1’s • Reduce phase: compress this second list by adding up the 1’s to get the cardinality • The first list could be spread across an array of machines, with the results being filtered into a smaller number, and the final result filtered into a final, single machine.