190 likes | 317 Views
Column-based dbs. BigTable , HBase , SimpleDB , and Cassandra. But first, the third assignment. This is due on Monday, the 18 th , by the beginning of class As with the first assignment, contact the grader when you are done
E N D
Column-based dbs BigTable, HBase, SimpleDB, and Cassandra
But first, the third assignment • This is due on Monday, the 18th, by the beginning of class • As with the first assignment, contact the grader when you are done • Build a Neo4J database with the Neo4j web GUI (localhost:7474) and Cypher and/or Gremlin • Note that the Console tab gives access to the documentation • Also note that the Console tab gives access to Gremlin • You can use either Cypher or Gremlin (or both) to do your assignment
3rd assignment, continued • Your Neo4J databases • Model customer sites and service personnel • Use at least 15 sites and 6 personnel • Each site is a node • Each service person is a node • As calls come in • a property is created for the given site that describes the nature of the problem • a person is assigned to a node (and a relationship is made) • Each node has a property that specifies the nature of its problem • Each person has a property that specifies the sorts of problems he/she can solve
3rd assignment, continued • Support the following operations • Creating a site • Creating a service personnel • Assigning a problem property to a site, sites can have many of these • Assigning a specialty to a service personnel, personnel can have many of these • Assigning a person to a site • Removing a problem property and a relationship that corresponds to it • Removing a site • Removing a personnel • Anything you want to add…
Column-based DBs • BigTable • First notable column-based DB • No schema • Sparse tables, e.g., no empty columns • Groups (or families) of columns stored together
Basic concepts • First column is a key • Column structure is next • Group of columns • We can select all or a given column • Idea is that the group is often accessed together • Generally, new columns can be added to a row at run time, but new families might require going offline
Cassandra: columns and rows • Basic unit of data • A column is a name-value pair, the value is atomic • The name is a key • Each pair has a timestamp • Used to manage update conflicts and old data • A row • Is a collection of columns associated with a row key • This is a larger grained key – for a row, not a column • A collection of similar rows is a column family
Cassandra: standard and super columns, and keyspaces • If the columns in a family are simple, it is a standard column family • The rows in a column family do not have to have the same structure • You can add columns to rows without having to do it to other rows in the family • A super column is a pair consisting of a name and a value, where the value is another map of columns • Standard and super column families are kept in keyspaces, essentially, this is a database
Cassandra: updates and reads • Updates • Commit log is written to • Update goes to in-memory store called memtable • This means that it has succeeded • Writes batched in memory and written to structures called SSTable • Variable consistency • Setting 1 is default for read, we get the first replica even if it is stale • Subsequent reads will get the newest and this is called a read repair • Good for high read throughput
Cassandra: writes • Level 1 means • Writes to a commit log and confirms to user • Some writes might be lost if they are not propagated to other replicas • Quarorum consistency • For a read, means that majority respond to a read • And the one with the newest timestamp is returned • Nodes without the most recent version must do a read repair • For a write has to be propagated to a majority of nodes before it is successful and client notified
Cassandra writes, continued • The consistency level All • All nodes must respond to a read or write • This is very sensitive to nodes being down • Notes • A single application can use varying levels of consistency • Uses a distributed cluster model • No node in a cluster is a master
Cassandra: transactions • Transactions • Cannot perform a system of reads and writes and then decide whether to abort • But there are apparently second party libraries that can be used to create true atomic transactions • Writes are atomic at the row level • So a column insertion or update is a single write that succeeds or fails • There are transaction libraries that can be used to coordinate reads and writes
Cassandra: query language • First, set your keyspace • Query language • Basic Get, Set, Delete operations • Create a column family • Set column value • Get a column value or values • Delete column family • Delete column • There are SQL-like commands • SQL like set queries • We can create indices on both row keys and column keys
Applications of Cassandra • Content management systems • Blogging systems
Installing Cassandra • Go to: http://cassandra.apache.org/download/ • Download and un-compress • Look at: http://wiki.apache.org/cassandra/GettingStarted • Go to the cassandra folder • Run bin/cassandra –f • On my mac, I needed to use sudo • I also had to create the cassandra folders listed in the GettingStarted instructions • Try running bin/cassandra-cli (command line interface)
Or to get it with a GUI • Go to: http://blog.shelan.org/2012/06/cassandra-gui-20-making-things-little.html • Run wso2server.sh (or bat) • Go to https://localhost:9443 • Login into https://your-ip-address:9443/services (NOT localhost)
Another choice • Go to: http://www.datastax.com/resources/articles/getting-started-with-apache-cassandra • Install • Run it • Go to: http://localhost:8888/opscenter/index.html • To explore example db: http://localhost:8888/opscenter/online_help/docs/explorer/index.html
Note on windows 7 • You might have to set your JAVA_HOME variable • Usually c:\Progra~1\Java\jdk1.7.0 (or similar)
PostgreSQL: install • Go to: http://bitnami.org/stacks • Install WAPP (windows) or MAPP (mac) • Startup web server • Startup postgresql • Go to: http://127.0.0.1/phppgadmin/