1 / 17

NoSql databases

NoSql databases. Please remember to read the NOSQL Distilled book and the Seven Databases book. Before we start. The classification of the various nosql databases is imprecise, semi-controversial, and we have to be careful about reading too much into it.

klaus
Download Presentation

NoSql databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NoSql databases Please remember to read the NOSQL Distilled book and the Seven Databases book

  2. Before we start • The classification of the various nosql databases is imprecise, semi-controversial, and we have to be careful about reading too much into it. • Rather than focusing on categorizing dbs, we should be concerned with what they do, how they relate to each other with respect to functionality, and how they compare to sql databases.

  3. Key-value and key-document DBs • Databases that access aggregate data • Key-value dbs know nothing about the structure of the aggregate • Key-document databases do know, but the interpretation of these aggregates happens outside the db • Keep in mind that these two categories of databases overlap in practice • Importantly, both of these two database systems categories focus on storing and retrieving individual aggregates, and not on interrelating (horizontally) multiple aggregates • There is something similar to this in SQL DBs – and that is highly un-normalized tables

  4. Important notions… • It can be a difficult problem to represent some domains as key-value or key-document databases, as the boundaries of aggregates might not be easy to determine. • This basic data modeling issue has a lot of influence on the sort of database you should use. • Relational databases don’t manipulate aggregates, but they are aggregate neutral for the most part, leaving the construction of aggregates to run time … but we might have hidden, un-normalized tables that make some commonly used aggregates much faster to materialize

  5. Key-value vs. key-document • In key-value databases, we can only retrieve data via a key • In key-document databases, we may be able to ask questions about the content of documents – but again, we are not cross-associating them • Mongo is perhaps the most talked about key-document system, and so we will start there

  6. Installing Mongo • Mongo • http://docs.mongodb.org/manual/installation • A GUI • http://www.mongodb.org/display/DOCS/Admin+UIs

  7. Mongo overview • Document based • Focuses on clusters for extremely large scaling • Supports nested documents • Uses JavaScript for queries • No schema

  8. Terminology • A database consists of collections • Collections are made up of documents • A document is made up of fields • There are also indices • There are also cursors

  9. When to use Mongo • Medical records and other large document systems • Read heavy environments like analytics and mining • Partnered with relational databases • Relational for live data • Mongo for huge largely read only archives • Online applications • Massively wide e-commerce

  10. Mongo documents and queries • Documents • Self-defining, with hierarchical structure • like XML • Or JSON, which uses javascript to define docs in a human-readable form • Documents can vary in structure, even in the same collection • You can add attributes to new documents in a collection without having the change the existing ones in the collection • Queries: db.order.find({“customerId”:”99”})

  11. Consistency and transactions • There is a tailor-able consistency command that can be used the level you want for updating replicas of documents • No multi-document atomic transactions are supported • CAP theorem, which basically says there is a tradeoff between availability and consistency • You can embed references to other documents in a document, but this tends to create a “join effect” • DBRef is the command

  12. Selectors • Used for finding, counting, updating, and removing docs from collections • {} is the null search and matches all documents • We could run: {gender:’f’} • {field1: value1, field2: value2} creates an ‘and’ operation • Also, less than, greater than, etc. (e.g., $gt) • $exists, $or

  13. Some notes on Mongo • There are a few GUIs that seem pretty good • Mongo-vision: http://code.google.com/p/mongo-vision/ (web page) • Needs Prudence as a web server • MongoVue: http://mongovue.com, but Windows only • RockMongo (web based): http://rockmongo.com/ (web page) • Needs an apache web server • Very easy to install, just download • http://docs.mongodb.org/manual/installation

  14. Getting an Apache web server • XAMPP for windows (mac version is way out of date) • MAMP for Macs (on the app store) • WAMP for windows (bitnami.org) • All of these give you PHP and MySQL as well. If we have time, we will look at MySQL full text search. • You might want to install PostgreSQL, too. There is a bitnami stack. If there is time, we will look at PostgreSQL UDTs and full text search.

  15. Another document DB: CouchDB • Major focus: surviving network problems • Engineered for web use • No ad hoc querying, searching is via map reduce-based indices • We will get back to CouchDB

  16. Map Reduce • Focus is on performing data operations on parallel hardware • This is a paradigm, not a specific programmatic technique • Each map reduce process has two phases • Convert a list into a desired sort of list with the map operator • Convert the new list into a small number of atomic values via a reduce operator • This allows us to spread an process across a wide array of servers, with each server performing an independent map reduce process

  17. Map reduce example, from Seven DBs • Map phase: go through a list of items and find all that are related to Canada, and turning them to 1’s • Reduce phase: compress this second list by adding up the 1’s to get the cardinality • The first list could be spread across an array of machines, with the results being filtered into a smaller number, and the final result filtered into a final, single machine.

More Related