1 / 27

Survey of Graph Database Models

Survey of Graph Database Models. Byoung Ju Yang 2011. 04. 01. IDS Lab., Seoul National University. Table of contents. Survey of Graph Database Models Renzo Angles, Alaudio Gutierrez ACM Computing Surveys, Vol. 40, No. 1, Article 1 (2008)

chin
Download Presentation

Survey of Graph Database Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey of Graph Database Models ByoungJu Yang 2011. 04. 01. IDS Lab., Seoul National University

  2. Table of contents • Survey of Graph Database Models • Renzo Angles, Alaudio Gutierrez • ACM Computing Surveys, Vol. 40, No. 1, Article 1 (2008) • Data structures, Query languages, and Integrity constraints 1. Introduction 2. Graph Data Modeling 3. Graph Database Models (~2002) • The latest Graph Database Models • Neo4j, FlockDB • Blueprint • Sharding

  3. 1. Introduction

  4. 2-1. What is a Graph Data Model? • Data Structure(Schema) • Represented by graph, or by data structure generalizing the notion of graph(hypergraph) - (un)labeled, (un)directed • Separation between schema and data in most cases. • Data Manipulation (Query languages) • Expressed by graph transformations, or by operations whose main primitives are on graph features like paths, neighborhoods, subgraphs, graph patterns, connectivity, and graph statistics. • Integrity constraints • Enforce graph data consistency

  5. 2-2. Why a Graph Data Model? • It allows for a more natural modeling of data • Being able to keep all the information about an entity in a single node and showing related information by arcs connected to it. • Queries can refer directly to this graph structure • Such as finding shortest paths, determining certain subgraphs, and so forth. • For implementation, graph databases may provide special graph storage structures and efficient graph algorithms for realizing specific operations.

  6. 2-3. Comparison with other DB Models • Physical DB Models • Hierarchical(1976), network(1976) models • Lack a good abstraction level • Relational DB Models • Introduced a separation btw physical and logical levels • Landmark development (mathematical foundation) • Geared toward simple record-type data (schema is known) • Not easy to integrate different schemas • Query language cannot explore the underlying graph of relationships among the data (path, neighborhoods, patterns)

  7. 2-3. Comparison with other DB Models • Semantic DB Models • DB designer can represent objects and their relations in a natural and clear manner by using high-level abstraction concepts (E-R) • Relevant to graph DB (graph-like structures) • Object-oriented DB Models • For data-intensive domains (knowledge bases, eng. applications) • Permit much richer structures but still require predefined schema • Related to graph DB (use graph structures in definitions) • Semi-structured DB Models • Irregular, implicit, and partial structures

  8. 2-4. Motivations and Applications • Motivations • Real-life App. where component interconnectivity is a key feature • Applications • Classical applications • Complex networks - Social networks (people, groups) - Information networks (citation, word thesaurus) - Technological networks (spatial and geographical) - Biological networks (genomics)

  9. 3-1. Brief historical overview

  10. 3-2. Data Structures name name Person1 Young key Person1 Young key name Person2 Sang 1 Person2 Sang 1 name Person3 Yong chin Person3 Yong chin • Hypernode • Simple flat graph is not good at presenting information to user • Hypernode provides inherent support (nested graphs) • Hypergraph • Generalization of a graph • 2-uniform hypergraph is a graph

  11. 3-3. Integrity Constraints • Schema-instance consistency • The instance should contain only concrete entities and relations from entity types and relations that were defined in the schema • Schema-instance separation • In most models there is a separation • An exception is the hypernode (dynamic DB) • Concentrated in the creation of consistent instances and the correct identification and reference of entities.

  12. 3-4. Query and Manipulation Languages • There is substantial work focused on query languages, the problem of querying graphs, the visual presentation of results, and graphical query languages • Some graph-oriented object models regard database transformations as graph transformations based on graph-pattern matching • GOOD, GOAL, etc.

  13. 3. Summary

  14. NoSQLDataBases • Schema-less • Shared nothing architecture • Each server uses only its own local storage (faster) • Elasticity • Able to add servers without downtime • Sharding • Asynchronous replication • BASE instead of ACID

  15. NoSQL Database Models

  16. Graph Database Models • Scalability • ACID vs. BASE • Complexity • Relational - no redundancy or information loss (normalization) powerful SQL, optimization by RDBMS - performance problem in deep queries (many joins) no schema evolution, etc • Graph – property graph model

  17. The latest Graph Database Models AllegroGraphRDFStore HyperGraphDB InfoGrid Neo4j FlockDB Sones Virtuoso

  18. The latest Graph Database Models • License • Distribution • The only one truly distributed solution is HyperGraphDB • Indexing • Neo4j, indexing is not default behavior (index by Lucene, Solr) • Storage system • General vs. Special • HyperGraphDB uses Berkeley DB • APIs • Most of them provide java and web APIs

  19. Neo4j • Full ACID-transaction compliant graph DB written in java • High performance • Handles several billion nodes, relationships and properties • 1~2 million traversal / second - constant time (independent of total size) • Example code • Node creation • Find friend

  20. Neo4j • Example code • Traversal • Indexing

  21. Neo4j

  22. FlockDB • Goals • High rate of add/update/remove operations • Complex set arithmetic queries • Paging through query result sets containing millions of entries • Ability to ‘archive’ and later restore archived edges • Horizontal scaling including replication • Non-goals • Multi-hop queries (or graph-walking queries) • Automatic shard migrations • Characteristics • Optimized for very large adjacency lists (no traversal)

  23. FlockDB - Twitter • Previous models (could not have both) • Relational tables – handling write operations • Key-value storage – paging through giant result sets • Implementation goals • Write the simplest possible thing that could work • Use off-the-shelf MySQL as the storage engine • Allow horizontal partitioning • Allow write operations to arrive out of order or be processed more than one. (allow redundant work rather than lost work) • Twitter (April 2010) • More than 13 B edges, 20k writes/second, 100k reads/second

  24. FlockDB - Twitter • Stores graphs as sets of edges • Primary key (a compound key of the source ID, state, and position) • When an adge is deleted, the row is just marked ‘removed’ without deleting from MySQL • Keep only a compound primary key and a secondary index for each row, and answer all queries from a single index.

  25. Sharding in Graph DB • Especially hard in graph DB due to traversal • Unless we store the entire graph on a single machine, we are forced to query across machine boundaries (expensive) • Neo4j provides master/slave structure (still has limit) • FlockDB(twitter) does not consider (interested in 1-level relations)

  26. How to shard? • A proposal: gravity • Localizing data leads to greater performance (like cache) • Shard graph data based on gravity

  27. Blueprints • A collection of interfaces, etc for the property graph DB model • Analogous to the JDBC, but for graph DB • Provides a common set of interfaces to allow developers to plug-and-play their graph DB backend. (Pipes, Gremlin, Rexster)

More Related