1 / 15

Graph databases

Graph databases. …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks. Master-slave replication. The Master holds primary copies This provides a high availability for reads A lot of applications are very read heavy

ashtyn
Download Presentation

Graph databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks

  2. Master-slave replication • The Master holds primary copies • This provides a high availability for reads • A lot of applications are very read heavy • Sometimes, the assignment of the master machine can be moved to another machine • A master/slave environment can lead to inconsistency, as in any replicate-legal database

  3. Fundamental components • Nodes • Properties • A query is a graph traversal … in effect, path expressions

  4. Details • No particularly limit semantically on the number of relationships a node can take part in • Normally we don’t use multiple servers • Using Neo4J as our example • Relationships are uni-directional • Relationships can have properties, and they can be single values, or collections

  5. Applications for graph DBs • Social networks • Learning systems • Systems that must route energy, messages, email, etc. • Mapping and location-aware systems

  6. Transactions • ACID transactions • Within a server, ACID properties hold • In a cluster, an update to a primary copy (the copy on the master server) is eventually propagated to all replicas (on the slaves) and copies on slaves are always available for reading • In a cluster, writes to slaves may not be immediately propagatedto the masterand updates to other slaves might take time, but the master is committed first

  7. Constraints and transactions • A relationships must always have two nodes • A node must have no relationships to be deleted • A read does not have to be packaged as a transaction • To change a node or relationship, there must be a transaction wrapped around the query • There is a rollback operation for failed transactions

  8. Programs on graph DBs • There are graph traversing languages • (A big thing on the semantic web, actually, where two assertions can be linked to create an inference) • In Neo4J we can index nodes by their properties • It uses Lucene, a high powered indexing engine • We can also traverse a graph via relationships • We can also find all nodes related to a given node by some particular relationships • We can search for the shortest path between a given two nodes

  9. Issues of scale • Distribution can be done via sharding on key-value and key-Document NoSQL databases • With a graph database, since it is a non-aggregate oriented db, sharding is difficult to do because it can cause us to go over a server boundary to traverse a relationship • In a graph db, we can try to put a lot of a graph in memory • We can add slaves and ramp up read access • We can carefully shard, knowing something about how the relationships will be traversed commonly

  10. Installing Neo4J • http://www.neo4j.org/install • Just unzip the file you downloaded, in the directory (e.g. neo4j-community-1.9.M04) you'll find: • configuration files under conf/ • database files under data/graph.db • log under data/log and data/graph.db/messages.log • start- and other scripts under bin/ • Just start Neo4j Server with bin/neo4j start in a command line console or terminal of your choice, then open http://localhost:7474 for the web interface.

  11. Gremlin • Written in Groovy (used in Grails) • (or use Java) • Terminology • Vertex • Edge • Query • Locate vertices • List properties of vertices • Filter vertices by properties • Traverse edges from vertices

  12. Pipes • Gremlin takes a collection and outputs another collection • Collections • Vertices • Edges • Property values • Each step is a “pipe”

  13. Cypher • START – at a node • Found via node ID, list of node IDs, or by an index lookup • MATCH – for examining patterns in relationships • WHERE – filters properties on a node or relationship • RETURN – defines retrieval set of nodes, relationships, or properties of nodes or relationships • ORDER and AGGREGATE commands, too • See http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html

  14. Second assignment: to be further described in class • Build a Mongo DB using RockMongo (or other GUI) or the Mongo shell • The Mongo DB will contain a set of documents describing content of your choice – use either your GUI or the Mongo shell to do the following • The documents must support and show embedding • Create at least ten documents • Delete documents • Search documents and return non-empty sets • Perform at least two map reduce operations • Contact the grader at: haojie.hang@gmail.com to demo

  15. Third assignment: lookahead… • Build a Neo4J database with Cypher • The Neo4J database will • Use Cypher to…

More Related