1 / 29

Capturing and Querying Multiple Aspects of Semistructured Data

Capturing and Querying Multiple Aspects of Semistructured Data. Curtis Dyreson (formerly) Dept. of Comp. Sci., James Cook University Michael Böhlen, Christian S. Jensen Nykredit Center for Database Research Department of Computer Science , Aalborg University www.cs.auc.dk/NDB. Outline.

liko
Download Presentation

Capturing and Querying Multiple Aspects of Semistructured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Capturing and Querying Multiple Aspects of Semistructured Data Curtis Dyreson (formerly) Dept. of Comp. Sci., James Cook University Michael Böhlen, Christian S. Jensen Nykredit Center for Database Research Department of Computer Science, Aalborg University www.cs.auc.dk/NDB VLDB ‘99 Edinbugh, Scotland

  2. Outline • meta-data • representation • properties • queries • collapse • match • coalesce • AUCQL • summary VLDB ‘99 Edinburgh, Scotland

  3. Meta-data • database meta-data • schema, security, transaction time • web meta-data • author, language, subject (Dublin Core), privacy • web `meta-data’ standards • RDF, P3P • intrinsic • informational, but also exclusional • irregular • ad-hoc VLDB ‘99 Edinburgh, Scotland

  4. Movie database • movie data • Bruce Willis stars in Colour of Night. • Colour of Night premiered 1/Jul/1995. • publication meta-data • language English • URL http://www.auc.dk • publication date 2/Apr/1997 • privacy/security ‘over 18’ • publication history v1.2, modified 31/Jul/1998 • subject Film, Suspense, Thriller • queries • Retrieve information published at Danish web sites. • Find reviews published in the first week of the movie’s release. • Get suspense films starring Bruce Willis. VLDB ‘99 Edinburgh, Scotland

  5. meta-data • schema • security • language • URL • subject • time A semistructured database ... • database • edges with labels • nodes • values movie &1 star &2 age Oscars name ... ... Bruce Willis VLDB ‘99 Edinburgh, Scotland

  6. &1 &1 &1 name: title URL: www.movie.com title name: title Colour of Night Colour of Night Colour of Night Properties • propertyname:property value • default name property • A label is a set of properties. VLDB ‘99 Edinburgh, Scotland

  7. meta-meta-data: Joe authored the URL meta-data name title name author URL URL Joe www.movie.com Label semistructure &1 name: title URL: www.movie.com Colour of Night VLDB ‘99 Edinburgh, Scotland

  8. required missing the URL property missing the security property Properties (continued) • required properties • missing properties &1 name: movie security! over 18 &2 name: title URL: www.movie.com Colour of Night VLDB ‘99 Edinburgh, Scotland

  9. &1 &1 Not a path! name: reviewed trans. time: [1/Sep/1999 - uc] &2 &3 name: title trans. time: [1/Aug/1998 - uc] Colour of Night Property semantics • transaction time example &2 name: movie &3 name: title trans. time: [2/Apr/1997 - 31/Jul/1998] Color of Night VLDB ‘99 Edinburgh, Scotland

  10. Using an existing model • meta-data and data edges • retrieve titles of reviewed movies SELECT X.data FROM reviewed R, R.movie M, M.title X WHERE R.metadata.transtime INTERSECT M.metadata.transtime AND M.metadata.transtime INTERSECT X.metadata.transtime &1 title &2 metadata data &3 transtime 1/Aug/1998 - uc Colour of Night VLDB ‘99 Edinburgh, Scotland

  11. Design flaws • query must enforce semantics to avoid fictive results SELECT X.data FROM *. title X • wildcard unintentionally accesses meta-data • no means of enforcing required properties • even correctly formed queries are brittle • user guesses at meta-data encoding VLDB ‘99 Edinburgh, Scotland

  12. Outline • meta-data • representation • properties • queries • collapse • match • coalesce • AUCQL • summary VLDB ‘99 Edinburgh, Scotland

  13. 15 15 Coalesce min Collapse sum 22 22 Shortest paths 3 5 12 17 VLDB ‘99 Edinburgh, Scotland

  14. ? ? Collapse • Collapse the information along a path to a single edge. &1 name: reviewed trans. time: [1/Sep/1999 - uc] &2 name: movie &3 name: title trans. time: [1/Aug/1998 - uc] name: title trans. time: [2/Apr/1997 - 31/Jul/1998] Colour of Night Color of Night VLDB ‘99 Edinburgh, Scotland

  15. name: reviewed.movie.title trans. time: [1/Sep/1999 - uc] name: reviewed.movie.title trans. time: undefined Collapse example • PropertyCollapse for name is concatenation, for trans. time it is temporal intersection. &1 name: reviewed trans. time: [1/Sep/1999 - uc] &2 name: movie &3 name: title trans. time: [1/Aug/1998 - uc] name: title trans. time: [2/Apr/1997 - 31/Jul/1998] Colour of Night Color of Night VLDB ‘99 Edinburgh, Scotland

  16. Match (retrieval) • find paths that meet some condition(s) • path regular expression • role - exact match, e.g., title • regular expression operators (.|?*+) (reviewed.movie)*.(title | name) • only label matching changes • labels are sets of properties • required properties • values may be from non-string domains, use PropertyMatch VLDB ‘99 Edinburgh, Scotland

  17. ? ? ? LabelMatch example • name property - `movie’ compares to `movie’, continue • transaction time property - missing in target, continue • URL property - missing in query, continue • security property - required by database, no match! name: movie security! over 18 URL: www.movie.com name! movie trans. time: [now - now] query role label in database VLDB ‘99 Edinburgh, Scotland

  18. Retrieval queries • retrieval queries • replace only LabelMatch • test validity of each path with Collapse • cost • LabelMatch now O(m) where m is number of properties • Collapse is O(m*n) where n is length of path • backwards compatible • implicit name property • LabelMatch is string comparison • Collapse can be ignored • both kinds of labels can coexist VLDB ‘99 Edinburgh, Scotland

  19. &1 name: review security! subscriber trans. time: [16/Jul/1999 - uc] name: review security! developer trans. time: [1/Jul/1999 - 15/Jul/1999] &2 trans. time: [1/Jul/1999 - uc] Additional operations • Coalesce - compute a distributed property value VLDB ‘99 Edinburgh, Scotland

  20. name trans. time default domain strings time intervals objects PropertyCollapse PropertyMatch PropertySlice PropertyCoalesce last concatenation intersect = = overlaps semantic error semantic error intersect semantic error union coalesce Meta-data modification • framework is extensible • specify the semantics and domain. • Or just use it, default semantics. VLDB ‘99 Edinburgh, Scotland

  21. Outline • meta-data • representation • properties • queries • collapse • match • coalesce • AUCQL • summary VLDB ‘99 Edinburgh, Scotland

  22. AUCQL • Lorel SELECT statement derivative • example, retrieve all movie titles. SELECT Title FROM movie.title Title; • AUCQL replaces role with unordered list of properties SELECT Title FROM (name! movie).(name! title) Title; • default to required name property VLDB ‘99 Edinburgh, Scotland

  23. AUCQL (continued) • can use any property, retrieve current movie titles SELECT Title FROM (name! movie, trans. time: [now - now]). (name! title, trans.time: [now - now]) Title; • can set properties for entire query SET PROPERTY (trans. time: [now - now]); SELECT Title FROM movie.title Title; VLDB ‘99 Edinburgh, Scotland

  24. AUCQL (continued) • can use MATCH, COALESCE, COLLAPSE • example, show names along all current paths in the database SELECT PROPERTY(name, COLLAPSE(All)) FROM (trans. time: [now - now])* All; result, e.g, reviewed reviewed.movie reviewed.movie.title … VLDB ‘99 Edinburgh, Scotland

  25. Summary • meta-data • representation • labels with properties • property semantics • new query operations • extensible • AUCQL website • implemented research prototype • free, downloadable, Unix environment • http://www.cs.auc.dk/~curtis/AUCQL • interactive query engine • tutorials VLDB ‘99 Edinburgh, Scotland

  26. Related work • Lorel (Abiteboul et al., JDL 97) • non-simple labels • Chlorel/DOEM (Chawathe et al., ICDE ‘98) • Deterministic Paths (Buneman et al., ICDT ‘99) • RDF query languages (QL ‘98) • Query Service for RDF (Decker et al.) • P3P (Cranor) • RDF Query Specification (Malhotra and Sundaresan) VLDB ‘99 Edinburgh, Scotland

  27. Future work • XML/RDF/DCD translation • labels can share common properties • no container termination • property terminators? • recursive semi-structured labels • heterogeneous meta-data • does security mean security? • AUCQL has single property name space • dynamic scoping of properties • property semantics keyed to single property name VLDB ‘99 Edinburgh, Scotland

  28. Future work (continued) • soundness and completeness • incomplete with respect to graph operations • minimal set of operations • information preserving? property-specific basis • design guidelines for property semantics • implementation • path indexing (when labels have properties) • query optimization VLDB ‘99 Edinburgh, Scotland

  29. Collapse mechanics • collapse pair-wise along path • LabelCollapse: Label X Label -> Label for each property in both labels if property is in both then apply PropertyCollapse else add to result • PropertyCollapse is a property-specific constructor T X T --> T U {undefined} • required properties stay required • path is valid if no property is undefined VLDB ‘99 Edinburgh, Scotland

More Related