Graph databases

Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks

Master-slave replication • The Master holds primary copies • This provides a high availability for reads • A lot of applications are very read heavy • Sometimes, the assignment of the master machine can be moved to another machine • A master/slave environment can lead to inconsistency, as in any replicate-legal database

Fundamental components • Nodes • Properties • A query is a graph traversal … in effect, path expressions

Details • No particularly limit semantically on the number of relationships a node can take part in • Normally we don’t use multiple servers • Using Neo4J as our example • Relationships are uni-directional • Relationships can have properties, and they can be single values, or collections

Applications for graph DBs • Social networks • Learning systems • Systems that must route energy, messages, email, etc. • Mapping and location-aware systems

Transactions • ACID transactions • Within a server, ACID properties hold • In a cluster, an update to a primary copy (the copy on the master server) is eventually propagated to all replicas (on the slaves) and copies on slaves are always available for reading • In a cluster, writes to slaves may not be immediately propagatedto the masterand updates to other slaves might take time, but the master is committed first

Constraints and transactions • A relationships must always have two nodes • A node must have no relationships to be deleted • A read does not have to be packaged as a transaction • To change a node or relationship, there must be a transaction wrapped around the query • There is a rollback operation for failed transactions

Programs on graph DBs • There are graph traversing languages • (A big thing on the semantic web, actually, where two assertions can be linked to create an inference) • In Neo4J we can index nodes by their properties • It uses Lucene, a high powered indexing engine • We can also traverse a graph via relationships • We can also find all nodes related to a given node by some particular relationships • We can search for the shortest path between a given two nodes

Issues of scale • Distribution can be done via sharding on key-value and key-Document NoSQL databases • With a graph database, since it is a non-aggregate oriented db, sharding is difficult to do because it can cause us to go over a server boundary to traverse a relationship • In a graph db, we can try to put a lot of a graph in memory • We can add slaves and ramp up read access • We can carefully shard, knowing something about how the relationships will be traversed commonly

Installing Neo4J • http://www.neo4j.org/install • Just unzip the file you downloaded, in the directory (e.g. neo4j-community-1.9.M04) you'll find: • configuration files under conf/ • database files under data/graph.db • log under data/log and data/graph.db/messages.log • start- and other scripts under bin/ • Just start Neo4j Server with bin/neo4j start in a command line console or terminal of your choice, then open http://localhost:7474 for the web interface.

Gremlin • Written in Groovy (used in Grails) • (or use Java) • Terminology • Vertex • Edge • Query • Locate vertices • List properties of vertices • Filter vertices by properties • Traverse edges from vertices

Pipes • Gremlin takes a collection and outputs another collection • Collections • Vertices • Edges • Property values • Each step is a “pipe”

Cypher • START – at a node • Found via node ID, list of node IDs, or by an index lookup • MATCH – for examining patterns in relationships • WHERE – filters properties on a node or relationship • RETURN – defines retrieval set of nodes, relationships, or properties of nodes or relationships • ORDER and AGGREGATE commands, too • See http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html

Second assignment: to be further described in class • Build a Mongo DB using RockMongo (or other GUI) or the Mongo shell • The Mongo DB will contain a set of documents describing content of your choice – use either your GUI or the Mongo shell to do the following • The documents must support and show embedding • Create at least ten documents • Delete documents • Search documents and return non-empty sets • Perform at least two map reduce operations • Contact the grader at: haojie.hang@gmail.com to demo

Third assignment: lookahead… • Build a Neo4J database with Cypher • The Neo4J database will • Use Cypher to…

Graph databases

Graph databases

Presentation Transcript

Graph

NoSQL : Graph Databases

Graph Databases and the Semantic Web

Graph Algorithms and Databases

Data Modeling with Graph Databases

Fast Frequent Free Tree Mining in Graph Databases

Benchmarking traversal operations over graph databases

Graph Databases (GDB)

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

Graph Databases: Efficient storage  and Rapid retrieval 

Finding Regular Simple Paths in Graph Databases

Keyword Search Over Graph Databases

Correlation Search in Graph Databases

Computing Label-Constraint Reachability in Graph Databases

Graph Transformation in Relational Databases

First graph Second graph Third graph

Graph Undirected graph Directed graph

Discovering Frequent Subgraphs over Uncertain Graph Databases under Probabilistic Semantics

Finding Regular Simple Paths in Graph Databases

Graph Databases: Efficient storage  and Rapid retrieval 

Computing Label-Constraint Reachability in Graph Databases