1 / 37

Graph Data Management

北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University. Graph Data Management. Instructor: ZOU’ lei zoulei@pku.edu.cn. Outline. Applications and Challenges of Graph Data No-SQL systems Exiting Graph Database Systems About the course. Outline.

Download Presentation

Graph Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University Graph Data Management Instructor: ZOU’ lei zoulei@pku.edu.cn

  2. Outline • Applications and Challenges of Graph Data • No-SQL systems • Exiting Graph Database Systems • About the course

  3. Outline • Applications and Challenges of Graph Data • No-SQL systems • Exiting Graph Database Systems • About the course

  4. Graph Data (a) Protein Network (b) Social Network

  5. Some Challenges in Large Graph Data Management • An Example: Considering a SNS website, there are more than 1 billion active users. Query: I want to know whether “Tom is a friend of Jack, or a friend of his friends…?” Possible Solutions: (Storage) Store the connections between individuals in a relational table (Query) Perform Self-join Recursively….

  6. Some Challenges in Large Graph Data Management recursivequeries

  7. Network Motifs: Simple Building Blocks of Complex Networks (R. Milo, et al.@SCIENCE03)

  8. Network Motifs: Simple Building Blocks of Complex Networks (R. Milo, et al.@SCIENCE03) • Network motifs are patterns (sub-graphs) that recur within a network much more often than expected at random. Network motifs always correspond to some functional patterns in different networks. Questions: • How to find such motifs efficiently ? • Given a motif, how to find all embeddings of this motif efficiently?

  9. Frequent Subgraph Pattern Mining Graph Dataset (A) (B) (C) Frequent Patterns (min support is 2) (2) (1)

  10. query graph graph database Subgraph Search Query: Which compounds contain “benzene ring” ?

  11. Reachablility Query 15 • ?Query(1,11) • Yes • ?Query(3,9) • No 14 11 13 10 12 6 7 8 9 3 4 5 1 2

  12. Shortest Path Distance Query What’s the distance between two specified individuals ?

  13. RDF Data Management The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. WWW  Web of Pages Semantic Web Web of Data

  14. An RDF Data Example –Yago Project Structural Data

  15. An RDF Data Example

  16. An RDF Data Example

  17. SPARQL Query Query: Find all individuals who were born on Feb. 12, 1809 and died on April. 15, 1865. SPARQL Syntax Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } Query Graph

  18. An RDF Data Example

  19. Outline • Applications and Challenges of Graph Data • No-SQL systems • Exiting Graph Database Systems • About the course

  20. NO-SQL Databases • Key-value Store -- e.g., berkelyDB

  21. NO-SQL Databases • Column Family Store -- e.g., Hadoop/Hbase, Cassandra, Hypertable.. This is an evolution of key-value model. [1] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert Gruber: Bigtable: A Distributed Storage System for Structured Data. OSDI 2006: 205-218

  22. NO-SQL Databases • Document Store -- e.g., MongoDB, …

  23. Outline • Applications and Challenges of Graph Data • No-SQL systems • Exiting Graph Database Systems • About the course

  24. Existing Graph Database Systems The following is a list of several well-known graph database projects: • HyperGraphDB - an open-source (LPGL) graph database supporting generalized hypergraphs where edges can point to other edges • InfoGrid - an open-source / commercial (AGPLv3, free for small entities)graph database with web front end and configurable storage engines (MySQL, PostgreSQL, Files, Hadoop)

  25. Some Existing Graph Database Systems • Neo4j - an open-source / commercial (AGPLv3)graph database • DEX - A high-performance graph database and so on… International Graph Database Workshops: http://www.icst.pku.edu.cn/IWGD2010/index.html http://www.cse.unsw.edu.au/~gdm2011/

  26. An Example of Neo4j Finding friends of “Thomas Anderson” and  the friends of the friends too • Neo4j http://wiki.neo4j.org/content/The_Matrix

  27. Neo4j API---An Example private void printFriends( Node person ) {     Traverser traverser = person.traverse(         Order.BREADTH_FIRST, //Traverse图的模式         StopEvaluator.END_OF_GRAPH, // Traverse图的停止条件         ReturnableEvaluator.ALL_BUT_START_NODE, // 哪些图节点被返回         MyRelationshipTypes.KNOWS, //按照那些边来进行Traverse         Direction.OUTGOING ); // Traverse的方向     for ( Node friend : traverser )     {         System.out.println( friend.getProperty( "name" ) );     } }

  28. Outline • Applications and Challenges of Graph Data • No-SQL systems • Exiting Graph Database Systems • About the course

  29. Course Content • Graph Mining - frequent subgraph mining • Indexing & Query Processing - reachablility query - shortest path query - subgraph query - keyword search • RDF Data Management - Indexing & SPARQL Query Processing - RDF Dataset Construction

  30. 课程网站 • 网址: http://www.icst.pku.edu.cn/course/Graphdb/index.html • 教材(作者、书名、出版社及出版年): 1. 《数据挖掘概念与技术》 Jiawei Han & Micheline Kamber 著, 范明&孟小峰 译,机械工业出版社 (第二版) 2.《MANAGING AND MINING GRAPH DATA》, edited by CHARU C. AGGARWAL, HAIXUN WANG, Kluwer Academic Publishers, 2009 3. 《语义网基础》 Grigoris Antoniou;Frank van Harmelen 著, 机械工业出版社, 2008

  31. 课程考核 • 课堂报告 (30%) 每位学生报告一篇数据库领域(含数据挖掘,信息检索相关领域)顶级论文(20分钟+5分钟提问) • 作业(30%) 3 项作业,完成3项课题 • 课上表现(10%)

  32. 课程考核 • 课程研修报告 (30%): 课程研修报告包括两种形式,学生任选其一: 1) 文献综述型:介绍该课题的研究背景和相关已有工作。并对不同已有研究结果给出自己的评论。 2)论文型报告:鼓励学生就某个特定课题的从事创新性研究,并撰写论文。

  33. 自学内容 • Neo4j, http://neo4j.org/ • Freebase, http://www.freebase.com/ Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions. http://wiki.freebase.com/wiki/Freebase_API http://wiki.freebase.com/wiki/Libraries

  34. 课程目标 • 掌握图数据库的几种基本的查询算法和挖掘算法 • 了解图数据库技术在不同领域的应用情况 • 培养学生的独立思考和开展研究的能力。

  35. zoulei@pku.edu.cn 助教: 曲丞 qucheng@pku.edu.cn 贺斌斌 hebinbin@pku.edu.cn Let’s begin!

  36. References • Network Motifs: Simple Building Blocks of Complex Networks, R. Milo, et al., Science 298, 824 (2002) • Tim Berners-Lee, Lalana Kagal: The Fractal Nature of the Semantic Web. AI Magazine 29(3): 29-34 (2008) • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert Gruber: Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). OSDI 2006: 205-218 • Neo4j http://neo4j.org/ • Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor: Freebase: a collaboratively created graph database for structuring human knowledge. SIGMOD Conference 2008: 1247-1250 • Renzo Angles, Claudio Gutiérrez: Survey of graph database models. ACM Comput. Surv. 40(1): (2008)

More Related