330 likes | 430 Views
P2P Database Systems. 1. Advertisement. Local Data Model. bookstore. magazine. book. 2. Semantic Mapping. author. Schema. Element. Data. price. name. price. first-name. last-name. award. 3. Indexing. 4. P2P routing. P2P Databases. Advertisement
E N D
1 Advertisement Local Data Model bookstore magazine book 2 Semantic Mapping author Schema Element Data price name price first-name last-name award 3 Indexing 4 P2P routing P2P Databases Advertisement <bookstore specialty="novel"> <book style="autobiography"> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary </award> </author> <price currency=CAD>12</price> </book> <book style="textbook"> <author> <first-name>Mary</first-name> <last-name>Bob</last-name> </author> <price>55</price> </book> <magazine> <name>Times</name> <price currency=USD>4</price> </magazine> </bookstore> XPath Query //author[award] /bookstore/book[author/last-name=Bob]
RDF Peers A Scalable Distributed RDF Repository based on A Structured Peer-to-Peer Network
MAAN Protocol • Contains three classes of messages for • (a) Topology Maintenance – used for keeping the correct neighbor connections and routing tables and include JOIN/LEAVE, KEEPALIVE and other network-structure- stabilizing message • (b) STORE – inserts triples into the network • (c) SEARCH – visits the nodes where the triples in question are known to be stored, and returns the matched triples to the requesting node. MAAN: Multi-Attribute-Network
RDF Triple Loader • Reads an RDF document, parses it into RDF triples, and uses MAAN’s STORE message to store the triples into the RDFPeers network. • Gives each resource or literal a value using the SHA1 hash function. • When a RDFPeer receives a STORE message, it stores the triples into its Local RDF Triple Storage component.
Native Query • The native query resolver parses native RDFPeers queries and uses MAAN’s SEARCH message to resolve them (using successor routing algorithm). • EXAMPLE: (<info:mincai>, <foaf:name>, ?name)
Continued… • Atomic Triple Patterns
DATA MODEL- XML fragments • A subdocument of the original document • given a document D, a fragment thereof is defined as a subtree of the original document and identified by its absolute linear path.
DATA MODEL-Identifier Given a fragment, The identifier of that fragment is the path from the document root to the fragment root • /site • /site/regions/namerica • /site/regions/namerica/item[1]/quantity • /site/regions/europe/item[1]/quantity
DATA MODEL-Super and Child Fragments super fragment which is the ancestor of the current fragment super fragment path expressionps child fragment fragments labeled ‘sub’ child fragment path expressions pc
DATA MODEL-Example • /site/regions/namerica • ps: /site (not …regions!) • pc: /site/regions/namerica/item[1]/quantity • pc: /site/regions/namerica/item[1]/descr
Paths come as identifiers Each fragment is accessed through its own path Super fragments and children fragments are stored within the local peer Which mechanism to use? A lightweight DHT Implementation based on Chord Maintains list of successors at log distance Guarantees efficient access Xpath in DHT
XP2P extension of Chord ring • XML fragments and Xpaths along the ring: • transform a pc (ps) into a Nx
Fingerprinting path expression • Hash functions (e.g.SHA-1) are fine, but for a suitable solution: • Instead of hashing them, XP2P reduces them to shorter fingerprints. • Two main advantages: • Concatenation property • Authenticity of data content
FINGERPRINTING PATH EXPRESSION Due to Michael Rabin A = (a1, a2, . . . , am) be a binary string. A(t) = a1 ∗ tm-1+ a2 ∗ tm-2+ · · · + am. P(t) be an irreducible polynomial The fingerprint of A is the following: f(A) = A(t)modP(t).
FINGERPRINTING PATH EXPRESSION Concatenation property f(concat(A, B)) = f(concat(f(A), B)). The fingerprinting polynomial is the key: Degree of 64 Acceptable probability of 2−10 path expressions of 50 steps Maximum 230 fragments
FINGERPRINTING PATH EXPRESSION Extension of Chord is used: Fingerprinting instead of hashing Each peer stores minimal access information The fingerprint of its own identifier the fingerprint of the super-fragment ps a list of fingerprints of path expressions of the external sub-fragments, pc
Fragment Lookup - partial lookup Fragment is returned when node holding the XPath identifier is found. Doesn’t look for sub tag
Fragment Lookup – full lookup • Fragment is returned when full fragment is retrieved through sub tags
XPath Expression Lookup ( child axis only ) Full match attempt • 1. Fingerprint XPath into Chord ring. • 2. Lookup for the exact match. • a) Exact match found in match. Algorithm stops. • b) Not found. Go to next step. Partial match ( Bottom-up steps ) • 3. A step from the path expression is pruned. • 4. Check for the match. If not found go to step 3. Partial match ( Top-down steps ) • 5. Analyze the local content for match. If not match proceed to sub fragment in top-down fashion.
Two way navigation / private[2] / profressor / personalData[1] / pictures professor personalData[1] bu2 td1 bu1 bu2 private[2] bu1 td2 / profressor / personalData[1] pictures td1 / profressor td2
XPath Expression Lookup ( containing Descendant axis //) Motivation Idea • Cannot be solved only by fingerprinting. • Exhaustive search is not a feasible solution. • Lookup is done by Top-down fashion. This yields less intermediate results than bottom-up. • Use sub-fragment information of peer for early path detection.
XPath Expression Lookup ( containing Descendant axis //) Linear path expression Part with Descendant axis • A) Linear path expression is solved with composition of exact match and partial match lookups. [ Context Node finding ] • B) Part with Descendant axis with optimistic step-wise algorithm. [ From the Context node ] Query: /s1 / s2 /../ si // sj / sk /../ sn-1 / sn
Optimistic step-wise algorithm • Look for sjin arbitrary peer in local fragment and related path expressions • we can find sj in following locations. • Contained in fragment. [ can be retrieved and proceed] • Intermediate step of related path expressions. [ sjis already evaluated ] • Last step of related path expression.[ promising path expression, new direction to explore ] • Not in any of the related path expression.[ may be a new direction to explore with sub]
Routing Query Language Translation Query Semantics Static One implicit schema Globally known P2P content sharing: Gnutella, KaZaA etc. Quasi-Static Several schemas, occasionally created Administratively scoped Service discovery: Jini, SLP, Salutation Dynamic Heterogeneous schemas User scoped, semantic mapping needed PDBS: PeerDB, XP2P, RDFPeers etc. Schema Static Quasi-static Dynamic Distributed Search Mechanism : Components
Exact keyword DHT-based P2P content sharing E.g., CFS, eMule Partial keyword Unstructured P2P E.g., Gnutella, Fasttrack Property-value list Most service discovery protocols (SDPs) E.g., Jini, Salutation, UPnP Complex queries PDBSs and some SDPs Hierarchical, relational op. and ranges E.g., PeerDB, XP2P, SLP, Twine. Exact keyword Partial keyword Property-value list Complex queries Distributed Search Mechanism : Components Routing Query Language Translation Query Semantics Schema Expressiveness Static Quasi-static Dynamic
Flat Content-routing Hash Address-routing Hash-summary Signature-routing Flat Content-routing Preserve semantic info. in query for use in routing decisions at each hop E.g. semi-structured P2P & industrial SDP Hash Address-routing Hashing looses semantic info. Key to address (of target) mapping E.g., DHT-techniques, SkipNet Hash-summary Signature-routing Query semantic is preserved Bloom-filter based & lossy aggregation E.g., SSDS, NSS, DPMS, PLR Distributed Search Mechanism : Components Routing Query Language Translation Query Semantics Schema Expressiveness Exact keyword Static Partial keyword Quasi-static Property-value list Dynamic Complex queries
Content-routing Routing Query Language Translation Address-routing Signature-routing Flat Hash Query Semantics Hash-summary Topology Index Schema Expressiveness Exact keyword Static Central Unstructured Quasi-static Partial keyword Partially Decentralized Semi-structured Dynamic Property-value list Pure Decentralized. Structured Complex queries Distributed Search Mechanism : Components