290 likes | 384 Views
Capturing and Querying Multiple Aspects of Semistructured Data. Curtis Dyreson (formerly) Dept. of Comp. Sci., James Cook University Michael Böhlen, Christian S. Jensen Nykredit Center for Database Research Department of Computer Science , Aalborg University www.cs.auc.dk/NDB. Outline.
E N D
Capturing and Querying Multiple Aspects of Semistructured Data Curtis Dyreson (formerly) Dept. of Comp. Sci., James Cook University Michael Böhlen, Christian S. Jensen Nykredit Center for Database Research Department of Computer Science, Aalborg University www.cs.auc.dk/NDB VLDB ‘99 Edinbugh, Scotland
Outline • meta-data • representation • properties • queries • collapse • match • coalesce • AUCQL • summary VLDB ‘99 Edinburgh, Scotland
Meta-data • database meta-data • schema, security, transaction time • web meta-data • author, language, subject (Dublin Core), privacy • web `meta-data’ standards • RDF, P3P • intrinsic • informational, but also exclusional • irregular • ad-hoc VLDB ‘99 Edinburgh, Scotland
Movie database • movie data • Bruce Willis stars in Colour of Night. • Colour of Night premiered 1/Jul/1995. • publication meta-data • language English • URL http://www.auc.dk • publication date 2/Apr/1997 • privacy/security ‘over 18’ • publication history v1.2, modified 31/Jul/1998 • subject Film, Suspense, Thriller • queries • Retrieve information published at Danish web sites. • Find reviews published in the first week of the movie’s release. • Get suspense films starring Bruce Willis. VLDB ‘99 Edinburgh, Scotland
meta-data • schema • security • language • URL • subject • time A semistructured database ... • database • edges with labels • nodes • values movie &1 star &2 age Oscars name ... ... Bruce Willis VLDB ‘99 Edinburgh, Scotland
&1 &1 &1 name: title URL: www.movie.com title name: title Colour of Night Colour of Night Colour of Night Properties • propertyname:property value • default name property • A label is a set of properties. VLDB ‘99 Edinburgh, Scotland
meta-meta-data: Joe authored the URL meta-data name title name author URL URL Joe www.movie.com Label semistructure &1 name: title URL: www.movie.com Colour of Night VLDB ‘99 Edinburgh, Scotland
required missing the URL property missing the security property Properties (continued) • required properties • missing properties &1 name: movie security! over 18 &2 name: title URL: www.movie.com Colour of Night VLDB ‘99 Edinburgh, Scotland
&1 &1 Not a path! name: reviewed trans. time: [1/Sep/1999 - uc] &2 &3 name: title trans. time: [1/Aug/1998 - uc] Colour of Night Property semantics • transaction time example &2 name: movie &3 name: title trans. time: [2/Apr/1997 - 31/Jul/1998] Color of Night VLDB ‘99 Edinburgh, Scotland
Using an existing model • meta-data and data edges • retrieve titles of reviewed movies SELECT X.data FROM reviewed R, R.movie M, M.title X WHERE R.metadata.transtime INTERSECT M.metadata.transtime AND M.metadata.transtime INTERSECT X.metadata.transtime &1 title &2 metadata data &3 transtime 1/Aug/1998 - uc Colour of Night VLDB ‘99 Edinburgh, Scotland
Design flaws • query must enforce semantics to avoid fictive results SELECT X.data FROM *. title X • wildcard unintentionally accesses meta-data • no means of enforcing required properties • even correctly formed queries are brittle • user guesses at meta-data encoding VLDB ‘99 Edinburgh, Scotland
Outline • meta-data • representation • properties • queries • collapse • match • coalesce • AUCQL • summary VLDB ‘99 Edinburgh, Scotland
15 15 Coalesce min Collapse sum 22 22 Shortest paths 3 5 12 17 VLDB ‘99 Edinburgh, Scotland
? ? Collapse • Collapse the information along a path to a single edge. &1 name: reviewed trans. time: [1/Sep/1999 - uc] &2 name: movie &3 name: title trans. time: [1/Aug/1998 - uc] name: title trans. time: [2/Apr/1997 - 31/Jul/1998] Colour of Night Color of Night VLDB ‘99 Edinburgh, Scotland
name: reviewed.movie.title trans. time: [1/Sep/1999 - uc] name: reviewed.movie.title trans. time: undefined Collapse example • PropertyCollapse for name is concatenation, for trans. time it is temporal intersection. &1 name: reviewed trans. time: [1/Sep/1999 - uc] &2 name: movie &3 name: title trans. time: [1/Aug/1998 - uc] name: title trans. time: [2/Apr/1997 - 31/Jul/1998] Colour of Night Color of Night VLDB ‘99 Edinburgh, Scotland
Match (retrieval) • find paths that meet some condition(s) • path regular expression • role - exact match, e.g., title • regular expression operators (.|?*+) (reviewed.movie)*.(title | name) • only label matching changes • labels are sets of properties • required properties • values may be from non-string domains, use PropertyMatch VLDB ‘99 Edinburgh, Scotland
? ? ? LabelMatch example • name property - `movie’ compares to `movie’, continue • transaction time property - missing in target, continue • URL property - missing in query, continue • security property - required by database, no match! name: movie security! over 18 URL: www.movie.com name! movie trans. time: [now - now] query role label in database VLDB ‘99 Edinburgh, Scotland
Retrieval queries • retrieval queries • replace only LabelMatch • test validity of each path with Collapse • cost • LabelMatch now O(m) where m is number of properties • Collapse is O(m*n) where n is length of path • backwards compatible • implicit name property • LabelMatch is string comparison • Collapse can be ignored • both kinds of labels can coexist VLDB ‘99 Edinburgh, Scotland
&1 name: review security! subscriber trans. time: [16/Jul/1999 - uc] name: review security! developer trans. time: [1/Jul/1999 - 15/Jul/1999] &2 trans. time: [1/Jul/1999 - uc] Additional operations • Coalesce - compute a distributed property value VLDB ‘99 Edinburgh, Scotland
name trans. time default domain strings time intervals objects PropertyCollapse PropertyMatch PropertySlice PropertyCoalesce last concatenation intersect = = overlaps semantic error semantic error intersect semantic error union coalesce Meta-data modification • framework is extensible • specify the semantics and domain. • Or just use it, default semantics. VLDB ‘99 Edinburgh, Scotland
Outline • meta-data • representation • properties • queries • collapse • match • coalesce • AUCQL • summary VLDB ‘99 Edinburgh, Scotland
AUCQL • Lorel SELECT statement derivative • example, retrieve all movie titles. SELECT Title FROM movie.title Title; • AUCQL replaces role with unordered list of properties SELECT Title FROM (name! movie).(name! title) Title; • default to required name property VLDB ‘99 Edinburgh, Scotland
AUCQL (continued) • can use any property, retrieve current movie titles SELECT Title FROM (name! movie, trans. time: [now - now]). (name! title, trans.time: [now - now]) Title; • can set properties for entire query SET PROPERTY (trans. time: [now - now]); SELECT Title FROM movie.title Title; VLDB ‘99 Edinburgh, Scotland
AUCQL (continued) • can use MATCH, COALESCE, COLLAPSE • example, show names along all current paths in the database SELECT PROPERTY(name, COLLAPSE(All)) FROM (trans. time: [now - now])* All; result, e.g, reviewed reviewed.movie reviewed.movie.title … VLDB ‘99 Edinburgh, Scotland
Summary • meta-data • representation • labels with properties • property semantics • new query operations • extensible • AUCQL website • implemented research prototype • free, downloadable, Unix environment • http://www.cs.auc.dk/~curtis/AUCQL • interactive query engine • tutorials VLDB ‘99 Edinburgh, Scotland
Related work • Lorel (Abiteboul et al., JDL 97) • non-simple labels • Chlorel/DOEM (Chawathe et al., ICDE ‘98) • Deterministic Paths (Buneman et al., ICDT ‘99) • RDF query languages (QL ‘98) • Query Service for RDF (Decker et al.) • P3P (Cranor) • RDF Query Specification (Malhotra and Sundaresan) VLDB ‘99 Edinburgh, Scotland
Future work • XML/RDF/DCD translation • labels can share common properties • no container termination • property terminators? • recursive semi-structured labels • heterogeneous meta-data • does security mean security? • AUCQL has single property name space • dynamic scoping of properties • property semantics keyed to single property name VLDB ‘99 Edinburgh, Scotland
Future work (continued) • soundness and completeness • incomplete with respect to graph operations • minimal set of operations • information preserving? property-specific basis • design guidelines for property semantics • implementation • path indexing (when labels have properties) • query optimization VLDB ‘99 Edinburgh, Scotland
Collapse mechanics • collapse pair-wise along path • LabelCollapse: Label X Label -> Label for each property in both labels if property is in both then apply PropertyCollapse else add to result • PropertyCollapse is a property-specific constructor T X T --> T U {undefined} • required properties stay required • path is valid if no property is undefined VLDB ‘99 Edinburgh, Scotland