400 likes | 532 Views
Aules d’empresa 2011 Hands-on course. Contents. Introduction DEX API Running example Database construction Validate database construction Script loaders Query database Graph algorithms. , a graph database. Graph databases focus on the structure of the model.
E N D
Contents • Introduction • DEX API • Runningexample • Databaseconstruction • Validatedatabaseconstruction • Script loaders • Querydatabase • Graphalgorithms
, a graph database • Graph databases focus on the structure of the model. • Nodes and edges instead of tables. • Implicit relation in the model. • DEX is a programming librarywhich allows to manage agraph database. • Very large datasets. • High performance query processing.
Basic concepts • Persistent and temporary graph management programming library. • Data model: Typed and attributed directed multigraph. • Node and edge instances belong to a type (label). • Node and edge instances may have attribute values. • Edge can be directed or undirected. • Multiple edges between two nodes. • Type of edges: • Materialized: directed and undirected. • Virtual: constrained by the values of two attributes (foreign keys) • Just for navigation
API • Java library: jdex.jar public API • Native library • Linux: libjdex.so • Windows: jdex.dll • System requirements: • Java Runtime Environment, v1.5 or higher. • Operative system: • Windows – 32 bits • Linux – 32 and 64 bits
Core API – class diagram Graphfactory Persistent DB GraphPool Session DbGraph N 1 DEX 1 1 N 1 Graph 1 1 RGraph N N Objects Set of OIDs Temporary
Core API – main methods GraphPool newSession() Session Session getDbGraph() DbGraph newGraph() Rgraph close() DEX open(filename) GraphPool create(filename) GraphPool close() Objects add(long) exists(long) copy(objs) union(objs) Intersection(objs) difference(objs) Graph newNodeType(name) int newEdgeType(name) int newNode(type) long newEdge(type) long newAttribute(type, name) long setAttribute(oid, attr, value) getAttribute(oid, attr) value select(type) Objects select(attr, op, value) Objects explode(oid, type) Objects Objects.Iterator hasNext() boolean next() long
Running example DEX dex = new DEX(); GraphPoolgpool = dex.create(“C:/image.dex”); Session s = gpool.newSession(); … … s.close(); gpool.close(); dex.close();
Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); intperson = dbg.newNodeType(“PERSON”); longname = dbg.newAttribute(person, “NAME”, STRING); longage= dbg.newAttribute(person, “AGE”, INT); long p1 = dbg.newNode(person); dbg.setAttribute(p1, name, “JOHN”); dbg.setAttribute(p1, age, 18); long p2 = dbg.newNode(person); dbg.setAttribute(p2, name, “KELLY”); long p3 = dbg.newNode(person); dbg.setAttribute(p3, name, “MARY”); s.commitTx(); … JOHN 18 KELLY MARY
Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); intfriend = dbg.newUndirectedEdgeType(“FRIEND”); intsince = dbg.newAttribute(friend, “SINCE”, INT); long e1 = dbg.newEdge(p1, p2, friend); dbg.setAttribute(e1, since, 2000); long e2 = dbg.newEdge(p2, p3, friend); dbg.setAttribute(e2, since, 1995); … intloves = dbg.newEdgeType(“LOVES”); long e3 = dbg.newEdge(p1, p3, loves); s.commitTx(); … JOHN 18 2000 KELLY 1995 MARY
Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); intphones = dbg.newEdgeType(“PHONES”); intwhen = dbg.newAttribute(phones, “WHEN”, TIMESTAMP); long e4 = dbg.newEdge(p1, p3, phones); dbg.setAttribute(e4, when, 4pm); long e5 = dbg.newEdge(p1, p3, phones); dbg.setAttribute(e5, when, 5pm); long e6 = dbg.newEdge(p3, p2, phones); dbg.setAttribute(e6, when, 6pm); s.commitTx(); … JOHN 18 2000 5pm KELLY 4pm 1995 MARY 6pm
Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); Objectspersons = dbg.select(person); Objects.Iteratorit = persons.iterator(); while (it.hasNext()) { long p = it.next(); Stringname = dbg.getAttribute(p, name); } it.close(); persons.close(); s.commitTx(); … JOHN 18 2000 5pm KELLY 4pm 1995 MARY 6pm
Running example … Objects objs1 = dbg.select(when, >=, 5pm); // objs1 = { e5, e6 } Objects objs2 = dbg.explode(p1, phones, OUT); // objs2 = { e4, e5 } Objectsobjs = objs1.intersection(objs2); // objs = { e5, e6 } ∩ { e4, e5 } = { e5 } … objs.close(); objs1.close(); objs2.close(); … JOHN 18 2000 5pm KELLY 4pm 1995 MARY 6pm
Databaseconstruction • DEX Basics: • Node and edgetype: • Publicidentifier: String. • DEX identifier: Integer. • Attribute: • Publicidentifier: String. • DEX identifier: Long. • Objectinstances: • DEX identifier (OID): Long.
Databaseconstruction • Nodes: • intGraph#newNodeType(Stringname) • Creates a new nodetypewiththegivenuniquename. • Returnsthe DEX nodetypeidentifier. • longGraph#newNode(intnodeType) • Creates a new objectbelongingtothegivennodetype. • Returnsthe DEX objectidentifier.
Databaseconstruction • Edges: • intGraph#newEdgeType(Stringname, booldirected) • Creates a new edgetypewiththegivenuniquename. • Directedorundirectededgetype. • Returnsthe DEX edgeidentifier. • intGraph#newRestrictedEdgeType(Stringname, intsrcNodeType, intdstNodeType) • Creates a new directededgetypewiththegivenuniquename. • Returnsthe DEX edgeidentifier. • (Integrityrestriction) Source and destionation of theedge are restrictedtothegivennodetypes. • longGraph#newEdge(longtail, long head, intedgeType) • Creates a new edgebelongingtothegivenedgetype. • Tail isthesource and head isthe target (iffdirected). • Returnsthe DEX objectidentifier.
Databaseconstruction • Attributes: • longGraph#newAttribute(inttype, Stringname, short dataType, short kind) • Creates a new attributewiththegivenuniquenameforthegivennodeoredgetype. • Returnsthe DEX attributeidentifier. • “dataType” can be: Value#STRING, Value#INT, Value#LONG, Value#DOUBLE, Value#BOOL, Value#TIMESTAMP. • “kind” can be: • Graph#ATTR_KIND_BASIC. Basic attribute (just set and getvalues). • Grahp#ATTR_KIND_INDEXED. Indexedattribute (set and getvalues as well as selectoperations) • Graph#ATTR_KINDUNIQUE. Indexedattribute. Unique (PK).
Databaseconstruction • Attributes: • ClassValueencapsulatesdifferent data types: • String, Integer, Long, Double, Boolean, Timestamp. • Use themto set and getattributevaluesfortheobjects. • Graph#setAttribute(longoid, longattr, Value v) • Sets thegivenValueforthegivenattributetothegivenobjectidentifier. • Givenattributemustbedefinedfortheobject’stype. • Value ‘s data typemust match attribute’s data typeor NULL. • Graph#getAttribute(longoid, longattr, Value v) • GetstheValueforthegivenattributeforthegivenobjectidentifier. • Givenattributemustbedefinedfortheobject’svalue.
Exercises • Allexercises are intotheNetbeansproject. • Open the IDE and theproject. • Required data sets are storedintothe “data” directory. • Requiredlibraries are storedintothe “libs” directory. • Allexerciseshave a mainmethodtobeexecuted.
Exercise 1 • Create a synthetic DEX: • Createthefollowingschema. • User (nicknamestring, …) • Tweet (bodystring, …) • tweets (…) // fromUsertoTweet • Addsome data. • APIstobeused: • Graph#newNodeType / Graph#newEdgeType • Graph#newNode / Graph#newEdge • Graph#newAttribute / Graph#setAtttribute • Value
Validatedatabaseconstruction • APIs: • GraphPool#dumpData(File f) • Dumps a summary of thelogicalcontent of thegraphdatabase. • GraphPool#dumpStorage(File f) • Dumpsinternalinformationaboutstoragecontent of thegraphdatabase. • Graph#export(PrintWriterpw, short kind, Export e) • Exportsthegraphtoanexternalformat. • “kind” can be: GRAPHVIZ or YGRAPHML. • Exportimplementation defines thevisualization (ifnull, default export). • Command-line shell: • edu.upc.dama.dex.shell.Shell • Seeedu.upc.dama.dex.shellpackagedescription.
Exercise 2 • Validateyourdatabaseconstruction: • Dump data summary. • Dumpstoragesummary. • Default export. • yED • (Optional) Shell. • APIstobeused: • Graph#dumpData. • Graph#dumpStorage. • Graph#export. • Shell.
Script loaders • Schemadefinition CREATE DBGRAPH alias INTO filename CREATE NODE node_type_name "(“ [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] , ...] ")“ CREATE [UNDIRECTED|VIRTUAL] EDGE edge_type_name [FROM node_type_name[.attribute_name] TO node_type_name[.attribute_name]] "(“ [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] , ...] ") [MATERIALIZE NEIGHBORS]"
Script loaders • Load nodes LOAD NODES file_name COLUMNS attribute_name [alias_name], … INTO node_type_name [IGNORE (attribute_name|alias_name), …] [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
Script loaders • Load edges LOAD EDGES file_name COLUMNS attribute_name [alias_name], … INTO node_type_name [IGNORE (attribute_name|alias_name), …] WHERE TAIL (attribute_name|alias_name) = node_type_name.attribute_name HEAD (attribute_name|alias_name) = node_type_name.attribute_name [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
Script loaders • APIs: • edu.upc.dama.dex.script.ScriptParser • Command-line tool: • edu.upc.dama.dex.script.ScriptParser • Seeedu.upc.dama.dex.scriptpackagedescription.
Twitter data model • Thisisthe data modelbasedonTwittertobeusedduringtheexercises.
Exercise 3 • CreatetheTwitterdatabase: • Complete theschemadefinition script. • Complete theloader script. • APIstobeused: • ScriptParser. • Resources: • CSV files intothe “data/twitter” directory. • Script files intothe “data/twitter/scripts” directory (*.des).
Exercise 4 • Once again, validateyourdatabaseconstruction: • Dump data summary. • Dumpstoragesummary. • Default export. • yED • (Optional) Shell • APIstobeused: • Graph#dumpData • Graph#dumpStorage • Graph#export • Shell
Querydatabase • Retrive data: • ClassObject • Set<Long> • Iterable<Long> • Storeslarge sets of objectidentifiers. • No order. • Combine operations: • Union. • Intersection. • Difference.
Querydatabase • Retrive data: • ObjectsGraph#select(int t) • Retrievesobjectidentifiersbelongingtothegivennodeoredgetype. • ObjectsGraph#select(longattr, short op, Value v) • Retrievesobjectidentifierswhichsatisfaythequery. • “op” can be: Graph#OPERATION_{EQ|NE|GT|GE|LT|LE|LIKE|ERE} • longGraph#findObj(longattr, Value v) • Retrieveobjectidentifierwhich has thegivenvalueforthegivenattribute (or INVALID_OID ifnotfound).
Querydatabase • Navigation: • ObjectsGraph#explode(longoid, intedgeType, short direction) • Retrievesout-goingor in-goingedges (orboth) fromortothegivenobject and forthegivenedgetype. • “direction” can be: Graph#EDGES_IN, Graph#EDGES_OUT, Graph#EDGES_BOTH. • ObjectsGraph#neighbors(longoid, intedgeType, short direction) • Retrievesneighbornodestothegivenobjectwhich can bereachedthroughthegivenedgetype and direction. • “direction” can be: Graph#EDGES_IN, Graph#EDGES_OUT, Graph#EDGES_BOTH.
Graphalgorithms • “edu.upc.dama.dex.algorithms” package. • Traversals: • Iterator<Long> • Returnsnodeidentifiers. • TraversalBFS • Breadth-firstsearch. • TraversalDFS • Depth-firstsearch. • Shortestpath: • SinglePairShortestPathBFS • Unweightedgraph. • SinglePairShortestPathDijkstra • Weightedgraph. • User can specifywhichnodeoredgetypes can beusedforthenavigation.
Attributevalues • ClassValues: • DifferentattributevaluesIterator. • Iterator<Value> • Ascendentordescendentorder. • RetrieveValues: • ValuesGraph#getValues(longattr, short order) • RetrieveValuesforthegivenattribute. • “order” can be: Graph#ORDER_ASCENDENT, Graph#ORDER_DESCENDENT.
Exercise 5 • Basic queries: • Get “Tweet”sfrom a “User”. • 1-hop navigation. • Get “Tweet”swhich share 2 (or more) given “Hastag”s. • Objectscombination. • Shortestdistancebetweentwogiven “User”s. • Justnavigatethroughthe “follows” relationship. • Use databasecreated at Exercise 3. • APIstobeused: • Graph#findObj /Graph#select • Graph#neighbors • Objects • SinglePairShortestPath
Exercise 6 • Updates: • Createanattributeforeach “User” tostorethenumber of references (“depicts”) tothe “User”. • Compute and storethevalueforeach “User”. • Findthemost popular “User”. • Themostreferencedone. • Use databasecreated at Exercise 3. • APIstobeused: • Graph#degree • Graph#newAttribute / Graph#setAttribute • Values
Export • Graph#export(PrintWriterpw, short kind, Export e) • “kind” can be: GRAPHVIZ or YGRAPHML. • ImplementExport interface to define thevisualization. • NodeExportgetNode(longoid) • Itiscalledforeachexistingnodeidentifier. • Return a NodeExportinstancewhich defines thevisualization of thegivennodeidentifier. • EdgeExoportgetEdge(longoid) • Itiscalledforeachexistingedgeidentifier. • ReturnanEdgeExportinstancewhich defines thevisualization of thegivenedgeidentifier.
Exercise 7 • Visualization: • UpdatethegivenExportimplementation. • Checkouthowitupdatestheresultingvisualization. • yED • APIstobeused: • Export • GraphExport • NodeExport • EdgeExport • Graph#export
Any question? DAMA Group Web Site: www.dama.upc.edu Sparsity Web Site: www.sparsity-technologies.com