190 likes | 564 Views
Graph databases. …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks. Master-slave replication. The Master holds primary copies This provides a high availability for reads A lot of applications are very read heavy
E N D
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks
Master-slave replication • The Master holds primary copies • This provides a high availability for reads • A lot of applications are very read heavy • Sometimes, the assignment of the master machine can be moved to another machine • A master/slave environment can lead to inconsistency, as in any replicate-legal database
Fundamental components • Nodes • Properties • A query is a graph traversal … in effect, path expressions
Details • No particularly limit semantically on the number of relationships a node can take part in • Normally we don’t use multiple servers • Using Neo4J as our example • Relationships are uni-directional • Relationships can have properties, and they can be single values, or collections
Applications for graph DBs • Social networks • Learning systems • Systems that must route energy, messages, email, etc. • Mapping and location-aware systems
Transactions • ACID transactions • Within a server, ACID properties hold • In a cluster, an update to a primary copy (the copy on the master server) is eventually propagated to all replicas (on the slaves) and copies on slaves are always available for reading • In a cluster, writes to slaves may not be immediately propagatedto the masterand updates to other slaves might take time, but the master is committed first
Constraints and transactions • A relationships must always have two nodes • A node must have no relationships to be deleted • A read does not have to be packaged as a transaction • To change a node or relationship, there must be a transaction wrapped around the query • There is a rollback operation for failed transactions
Programs on graph DBs • There are graph traversing languages • (A big thing on the semantic web, actually, where two assertions can be linked to create an inference) • In Neo4J we can index nodes by their properties • It uses Lucene, a high powered indexing engine • We can also traverse a graph via relationships • We can also find all nodes related to a given node by some particular relationships • We can search for the shortest path between a given two nodes
Issues of scale • Distribution can be done via sharding on key-value and key-Document NoSQL databases • With a graph database, since it is a non-aggregate oriented db, sharding is difficult to do because it can cause us to go over a server boundary to traverse a relationship • In a graph db, we can try to put a lot of a graph in memory • We can add slaves and ramp up read access • We can carefully shard, knowing something about how the relationships will be traversed commonly
Installing Neo4J • http://www.neo4j.org/install • Just unzip the file you downloaded, in the directory (e.g. neo4j-community-1.9.M04) you'll find: • configuration files under conf/ • database files under data/graph.db • log under data/log and data/graph.db/messages.log • start- and other scripts under bin/ • Just start Neo4j Server with bin/neo4j start in a command line console or terminal of your choice, then open http://localhost:7474 for the web interface.
Gremlin • Written in Groovy (used in Grails) • (or use Java) • Terminology • Vertex • Edge • Query • Locate vertices • List properties of vertices • Filter vertices by properties • Traverse edges from vertices
Pipes • Gremlin takes a collection and outputs another collection • Collections • Vertices • Edges • Property values • Each step is a “pipe”
Cypher • START – at a node • Found via node ID, list of node IDs, or by an index lookup • MATCH – for examining patterns in relationships • WHERE – filters properties on a node or relationship • RETURN – defines retrieval set of nodes, relationships, or properties of nodes or relationships • ORDER and AGGREGATE commands, too • See http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
Second assignment: to be further described in class • Build a Mongo DB using RockMongo (or other GUI) or the Mongo shell • The Mongo DB will contain a set of documents describing content of your choice – use either your GUI or the Mongo shell to do the following • The documents must support and show embedding • Create at least ten documents • Delete documents • Search documents and return non-empty sets • Perform at least two map reduce operations • Contact the grader at: haojie.hang@gmail.com to demo
Third assignment: lookahead… • Build a Neo4J database with Cypher • The Neo4J database will • Use Cypher to…