1 / 24

Aules d’empresa 2011 DEX

Aules d’empresa 2011 DEX. Contents. Graph database Motivation DEX Experiments. Graph database. What is a graph database? Data and schema are represented by graphs. Nodes, edges, and properties. Data manipulation is expressed as graph operations.

kacy
Download Presentation

Aules d’empresa 2011 DEX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aulesd’empresa 2011DEX

  2. Contents • Graphdatabase • Motivation • DEX • Experiments

  3. Graph database • What is a graph database? • Data and schema are represented by graphs. • Nodes, edges, and properties. • Data manipulation is expressed as graph operations. • Integrity constraints enforce graph consistency.

  4. Motivation • Trends in current data sets: • A higher degree of connectivity among entities. • A higher degree of complexity of data models. • Decentralization of data generation. • Users provide contents. • Requirements: • Queries with different flavors: • Structural queries (not based on the schema). • Link analysis. • Manage unstructured data. • Flexible schemas.

  5. Scenarios Social networks MySpace, Facebook, Flickr … Information networks Bibliographic databases: DBLP, Scopus … On-line encyclopedias: Wikipedia … Technological networks Electric power grids, airline routes, telephone networks … Biological networks Genomics, chemical structures …

  6. Why not RDBMS? • Classical relational model • Inefficient for unstructured data or flexible schemas • Prefixed schema, based on relations (tables) • Inefficient for structural queries • Intensive use of join operations

  7. , a graph database • DEX is a programming library which allows to manage a graph database. • Focuses on: • Very large datasets. • High performance query processing.

  8. Basic concepts • Persistent and temporary graph management programming library. • Data model: Typed and attributed directed multigraph. • Node and edge instances belong to a type (label). • Node and edge instances have attribute values. • Edge can be directed or undirected. • Multiple edges between two nodes. • Type of edges: • Materialized: directed and undirected. • Virtual: constrained by the values of two attributes (foreign keys) • Just for navigation

  9. A graph model

  10. Software architecture

  11. Software architecture • Java library: jdex.jar public API • Native library • Linux: libjdex.so • Windows: jdex.dll • System requirements: • Java Runtime Environment, v1.5 or higher. • Operative system: • Windows – 32 bits • Linux – 32 and 64 bits

  12. Application architecture Desktop application Web application Presentation Java Swing Application Browser HTML + Javascript INTERNET Network Load and Query Application Logic Servlet Query API DEX API DEX Data DEX DEX Data Sources Data Sources Graphs Graphs

  13. Experiments • Five categories: • Bulk load performance. • Core operations performance and memory usage • Scalability. • Comparison with other approaches. • Relational (MySQL) and OIM. • Query performance analysis • Different datasets: • Wikipedia. • IMDb, the Internet Movie Database. • XMark, a standard and scalable benchmark for XML. • LUBM, a benchmark to evaluate the performance of RDF repositories. • R-MAT, a synthetic scale-free network.

  14. Load performance Single CPU with 4096 KB of cache, 2 GB of RAM and 80 GB of disk. Operating system: Linux Debian etch 4.0 DEX buffer pool: 1.5 GB max.

  15. Operations performance and memory usage Benchmark: Wikipedia with more than 200 million nodes and edges

  16. Scalability XMark over 5 different scale factors ranging from 0.1 (110MB) to 25 (2.78GB)

  17. R-MATscalability

  18. Comparison with Other Approaches Comparison with a relational database (MySQL) and with an Oriented Incidence Matrix

  19. Comparisonwith Neo4j Query 1: max-outdegree + SPTQuery 2: paper recommender (2-hops) Query 3: patternmatchingQuery 4: for eachlanguage: number of papers and imagesQuery 5: for each paper: materializenumber of imagesQuery 6: delete papers with no images

  20. Another comparison with a RDBMS • Datasets: • D1: Synthetic data, generatedfrom R-MAT • Scale factor = 16 (524K edges) • D2: Synthetic data, generatedfrom R-MAT • Scale factor = 18 (2M edges) • D1 and D2 bothjustnodes and edges, no attributes. • R-MAT generatesscale-free networks. • Queries: • Q1: 3-hops from a givennode.

  21. Another comparison with RDBMS • Test: Execute Q1 for 5 specificnodes. • Thesequerynodeshave a significantnumber of out-goingedges. • Scale factor 16: aboutsometens • Scale factor 18: aboutsomehundreds • Results: • Scale factor 16: reachedabout 160K nodes • Scale factor 18: reachedabout 600K nodes

  22. Another comparison with RDBMS • Schema: CREATE TABLE `edges` ( `src` int(11) NOT NULL, `dst` int(11) NOT NULL, INDEX `srcI` (`src`) USING BTREE, INDEX `dstI` (`dst`) USING BTREE ) ENGINE=InnoDB; • Query: SELECT DISTINCT c.dst FROM edges as a, edges as b, edges as c WHERE (a.dst=b.src AND b.dst=c.src AND a.src=node);

  23. Results • Platform test • MacBook 2.4GHz Intel Core 2 Duo (Mac OS X 10.6) • Up to 1GB memoryforMySQL buffer pool. • Results

  24. Any question? DAMA Group Web Site: www.dama.upc.edu Sparsity Web Site: www.sparsity-technologies.com

More Related