Column-based dbs

Column-based dbs BigTable, HBase, SimpleDB, and Cassandra

But first, the third assignment • This is due on Monday, the 18th, by the beginning of class • As with the first assignment, contact the grader when you are done • Build a Neo4J database with the Neo4j web GUI (localhost:7474) and Cypher and/or Gremlin • Note that the Console tab gives access to the documentation • Also note that the Console tab gives access to Gremlin • You can use either Cypher or Gremlin (or both) to do your assignment

3rd assignment, continued • Your Neo4J databases • Model customer sites and service personnel • Use at least 15 sites and 6 personnel • Each site is a node • Each service person is a node • As calls come in • a property is created for the given site that describes the nature of the problem • a person is assigned to a node (and a relationship is made) • Each node has a property that specifies the nature of its problem • Each person has a property that specifies the sorts of problems he/she can solve

3rd assignment, continued • Support the following operations • Creating a site • Creating a service personnel • Assigning a problem property to a site, sites can have many of these • Assigning a specialty to a service personnel, personnel can have many of these • Assigning a person to a site • Removing a problem property and a relationship that corresponds to it • Removing a site • Removing a personnel • Anything you want to add…

Column-based DBs • BigTable • First notable column-based DB • No schema • Sparse tables, e.g., no empty columns • Groups (or families) of columns stored together

Basic concepts • First column is a key • Column structure is next • Group of columns • We can select all or a given column • Idea is that the group is often accessed together • Generally, new columns can be added to a row at run time, but new families might require going offline

Cassandra: columns and rows • Basic unit of data • A column is a name-value pair, the value is atomic • The name is a key • Each pair has a timestamp • Used to manage update conflicts and old data • A row • Is a collection of columns associated with a row key • This is a larger grained key – for a row, not a column • A collection of similar rows is a column family

Cassandra: standard and super columns, and keyspaces • If the columns in a family are simple, it is a standard column family • The rows in a column family do not have to have the same structure • You can add columns to rows without having to do it to other rows in the family • A super column is a pair consisting of a name and a value, where the value is another map of columns • Standard and super column families are kept in keyspaces, essentially, this is a database

Cassandra: updates and reads • Updates • Commit log is written to • Update goes to in-memory store called memtable • This means that it has succeeded • Writes batched in memory and written to structures called SSTable • Variable consistency • Setting 1 is default for read, we get the first replica even if it is stale • Subsequent reads will get the newest and this is called a read repair • Good for high read throughput

Cassandra: writes • Level 1 means • Writes to a commit log and confirms to user • Some writes might be lost if they are not propagated to other replicas • Quarorum consistency • For a read, means that majority respond to a read • And the one with the newest timestamp is returned • Nodes without the most recent version must do a read repair • For a write has to be propagated to a majority of nodes before it is successful and client notified

Cassandra writes, continued • The consistency level All • All nodes must respond to a read or write • This is very sensitive to nodes being down • Notes • A single application can use varying levels of consistency • Uses a distributed cluster model • No node in a cluster is a master

Cassandra: transactions • Transactions • Cannot perform a system of reads and writes and then decide whether to abort • But there are apparently second party libraries that can be used to create true atomic transactions • Writes are atomic at the row level • So a column insertion or update is a single write that succeeds or fails • There are transaction libraries that can be used to coordinate reads and writes

Cassandra: query language • First, set your keyspace • Query language • Basic Get, Set, Delete operations • Create a column family • Set column value • Get a column value or values • Delete column family • Delete column • There are SQL-like commands • SQL like set queries • We can create indices on both row keys and column keys

Applications of Cassandra • Content management systems • Blogging systems

Installing Cassandra • Go to: http://cassandra.apache.org/download/ • Download and un-compress • Look at: http://wiki.apache.org/cassandra/GettingStarted • Go to the cassandra folder • Run bin/cassandra –f • On my mac, I needed to use sudo • I also had to create the cassandra folders listed in the GettingStarted instructions • Try running bin/cassandra-cli (command line interface)

Or to get it with a GUI • Go to: http://blog.shelan.org/2012/06/cassandra-gui-20-making-things-little.html • Run wso2server.sh (or bat) • Go to https://localhost:9443 • Login into https://your-ip-address:9443/services (NOT localhost)

Another choice • Go to: http://www.datastax.com/resources/articles/getting-started-with-apache-cassandra • Install • Run it • Go to: http://localhost:8888/opscenter/index.html • To explore example db: http://localhost:8888/opscenter/online_help/docs/explorer/index.html

Note on windows 7 • You might have to set your JAVA_HOME variable • Usually c:\Progra~1\Java\jdk1.7.0 (or similar)

PostgreSQL: install • Go to: http://bitnami.org/stacks • Install WAPP (windows) or MAPP (mac) • Startup web server • Startup postgresql • Go to: http://127.0.0.1/phppgadmin/

Column-based dbs

Column-based dbs

Presentation Transcript

DBS 5048

Column Theory - Column Strength Curve

DBS Development

NoSQL DBs

Alterative DBs

DBS Cases

system IS422ABC@dbs

Multimedia DBs

Mixed-Mode BIST Based on Column Matching

Mixed-Mode BIST Based on Column Matching

Optimal Column-Based Low-Rank Matrix Reconstruction

Multimedia DBs

Relational DBs

DBS UXI Strategy

DBS Full Storyboard

DBS UPDATE

Column-Matching Based Mixed‑Mode BIST Technique

Multimedia DBs

Grid-Based Design: Six Creative Column Techniques

DBS Residential