NoSql databases

NoSql databases Please remember to read the NOSQL Distilled book and the Seven Databases book

Before we start • The classification of the various nosql databases is imprecise, semi-controversial, and we have to be careful about reading too much into it. • Rather than focusing on categorizing dbs, we should be concerned with what they do, how they relate to each other with respect to functionality, and how they compare to sql databases.

Key-value and key-document DBs • Databases that access aggregate data • Key-value dbs know nothing about the structure of the aggregate • Key-document databases do know, but the interpretation of these aggregates happens outside the db • Keep in mind that these two categories of databases overlap in practice • Importantly, both of these two database systems categories focus on storing and retrieving individual aggregates, and not on interrelating (horizontally) multiple aggregates • There is something similar to this in SQL DBs – and that is highly un-normalized tables

Important notions… • It can be a difficult problem to represent some domains as key-value or key-document databases, as the boundaries of aggregates might not be easy to determine. • This basic data modeling issue has a lot of influence on the sort of database you should use. • Relational databases don’t manipulate aggregates, but they are aggregate neutral for the most part, leaving the construction of aggregates to run time … but we might have hidden, un-normalized tables that make some commonly used aggregates much faster to materialize

Key-value vs. key-document • In key-value databases, we can only retrieve data via a key • In key-document databases, we may be able to ask questions about the content of documents – but again, we are not cross-associating them • Mongo is perhaps the most talked about key-document system, and so we will start there

Installing Mongo • Mongo • http://docs.mongodb.org/manual/installation • A GUI • http://www.mongodb.org/display/DOCS/Admin+UIs

Mongo overview • Document based • Focuses on clusters for extremely large scaling • Supports nested documents • Uses JavaScript for queries • No schema

Terminology • A database consists of collections • Collections are made up of documents • A document is made up of fields • There are also indices • There are also cursors

When to use Mongo • Medical records and other large document systems • Read heavy environments like analytics and mining • Partnered with relational databases • Relational for live data • Mongo for huge largely read only archives • Online applications • Massively wide e-commerce

Mongo documents and queries • Documents • Self-defining, with hierarchical structure • like XML • Or JSON, which uses javascript to define docs in a human-readable form • Documents can vary in structure, even in the same collection • You can add attributes to new documents in a collection without having the change the existing ones in the collection • Queries: db.order.find({“customerId”:”99”})

Consistency and transactions • There is a tailor-able consistency command that can be used the level you want for updating replicas of documents • No multi-document atomic transactions are supported • CAP theorem, which basically says there is a tradeoff between availability and consistency • You can embed references to other documents in a document, but this tends to create a “join effect” • DBRef is the command

Selectors • Used for finding, counting, updating, and removing docs from collections • {} is the null search and matches all documents • We could run: {gender:’f’} • {field1: value1, field2: value2} creates an ‘and’ operation • Also, less than, greater than, etc. (e.g., $gt) • $exists, $or

Some notes on Mongo • There are a few GUIs that seem pretty good • Mongo-vision: http://code.google.com/p/mongo-vision/ (web page) • Needs Prudence as a web server • MongoVue: http://mongovue.com, but Windows only • RockMongo (web based): http://rockmongo.com/ (web page) • Needs an apache web server • Very easy to install, just download • http://docs.mongodb.org/manual/installation

Getting an Apache web server • XAMPP for windows (mac version is way out of date) • MAMP for Macs (on the app store) • WAMP for windows (bitnami.org) • All of these give you PHP and MySQL as well. If we have time, we will look at MySQL full text search. • You might want to install PostgreSQL, too. There is a bitnami stack. If there is time, we will look at PostgreSQL UDTs and full text search.

Another document DB: CouchDB • Major focus: surviving network problems • Engineered for web use • No ad hoc querying, searching is via map reduce-based indices • We will get back to CouchDB

Map Reduce • Focus is on performing data operations on parallel hardware • This is a paradigm, not a specific programmatic technique • Each map reduce process has two phases • Convert a list into a desired sort of list with the map operator • Convert the new list into a small number of atomic values via a reduce operator • This allows us to spread an process across a wide array of servers, with each server performing an independent map reduce process

Map reduce example, from Seven DBs • Map phase: go through a list of items and find all that are related to Canada, and turning them to 1’s • Reduce phase: compress this second list by adding up the 1’s to get the cardinality • The first list could be spread across an array of machines, with the results being filtered into a smaller number, and the final result filtered into a final, single machine.

NoSql databases

NoSql databases

Presentation Transcript

NoSQL : Graph Databases

NoSQL

How Big is Big Data? And NoSQL Databases

NoSQL and NOSQL

NoSQL Databases

NoSQL Databases : MongoDB vs Cassandra

NoSQL Databases

NoSQL Databases Oracle - Berkeley DB

NoSQL Databases

COSC 416 NoSQL Databases Course Introduction

NoSQL Databases

“INTEROPERABILITY AMONG NoSQL DATABASES IN CLOUD”

.NET Database Technologies: Using NoSQL databases

Introduction to NOSQL Databases

Modern Databases NoSQL and NewSQL

NoSQL Databases - CouchDB

A Comparison of SQL and NoSQL Databases

Why are companies switching to NoSQL databases technology?

Post Relational Databases - The NOSQL Movement

A Comparison between Relational Databases and NoSQL Databases

Traditional Databases vs NOSQL

NoSQL Databases