380 likes | 510 Views
TOSS: An Extension of TAX with Ontologies and Similarity Queries. Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland, College Park SIGMOD, Paris, France, June, 2004. Outline. Introduction Ontologies and Integration
E N D
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland, College Park SIGMOD, Paris, France, June, 2004
Outline • Introduction • Ontologies and Integration • Similarity Enhanced Ontology (SEO) • TOSS Algebra • Implementation and Experiments • Related Work
Introduction • [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] • one of the best algebra developed for XML DB
SIGMOD Problems! DBLP
Problems • Lack of lexical semantics in answering queries • Find papers written by “J. Ullman”: • J.D. Ullman? Jeffrey Ullman? • Find papers whose at least one author is from “U.S. government”: • U.S. Census Bureau? U.S. Army? • High precision, poor recall • Quality = (recall precision)1/2
Our approach • Goal: extend and enhance the semantics of TAX to return high quality answers using ontology and similarity measures • capture inter-term lexical relationships by ontology and integrate ontologies of different DBs • use existing similarity measures to enhance the integrated ontology • TOSS: extend TAX algebra to query with ontology and similarity
Motivating Examples and TAX • DBLP and SIGMOD bibliographies in XML • TAX • selection • projection • product
Pattern tree • Selection
Pattern tree • Selection
Pattern tree • Selection
Ontology • Suppose Σ is some finite set of strings and S is some set. An ontology w.r.t. Σ is a partial mapping Θ from Σ to hierarchies for S • S = {article, author, title} • Σ = {part_of} • ≤H = {(author, article), (title, article)} • Θ(part_of) = (H, ≤H)
Ontology Integration SIGMOD DBLP
Ontology Integration SIGMOD DBLP IC (interoperation constraints)
Ontology Integration Hierarchy graph associated with SIGMOD and DBLP
Ontology Integration Fusion of ontologies of SIGMOD and DBLP
Similarity Enhanced Ontology • A string similarity measure dS is any function which takes two strings X,Y and returns a non-negative real number such that • X, dS(X,X) = 0 • X,Y, dS(X,Y) = dS(Y,X) • Any string similarity measure can be used. For example: Levenstein distance which assigns a unit cost to every edit operation. • dS(“relation”, “relational”)=2
Similarity Enhanced Ontology • A similarity measure is any function which takes nodes A, B as input and returns a non-negative real numbers such that • d(A,B) = minXS,YT dS(X,Y), where dS is a string similarity measure, S,T are sets of strings contained in nodes A,B.
Similarity Enhanced Ontology • Suppose H is an integrated hierarchy, d is a similarity measure and 0. (H’,) is a similarity enhancement of H w.r.t. d, iff H’ is a hierarchy and is a function from H to 2H’ such that: • the original partial orderings in H are preserved, and no unwarranted orderings are included • all nodesmapped into the same node are similar to each other (by the threshold ) • two strings are similar iff they are jointly present in some node in (H’,) • no redundantnode whose string set is a subset of some other node
Similarity Enhanced Ontology An example ontology Its similarity enhancement
Similarity Enhanced Ontology • (H, d, ) is similarity consistent iff there exists a similarity enhancement of H w.r.t. d, . • Theorem • If (H, d, ) is similarity consistent, then all similarity enhancements of H are equivalent.
TOSS Algebra • A simple selection condition has the form X op Y • op { =, , <, , >, , ~, instance_of, isa, part_of, subtype_of, above, below}, and X, Y are terms, i.e.,attributes (tag, content), types, or typed values v: with v dom(). • A selection condition is a simple selection condition OR a conjunction/disjunction of two selection conditions
TOSS Algebra • The pattern tree to find the titles of all papers in DBLP related to Microsoft (independently of the field in which Microsoft appears): #1.tag = inproceedings & #2.tag = title & #3.tag part_of inproceedings & #3.content ~ “Microsoft”
TOSS Algebra • In order to ensure an embedding to be correct w.r.t. a semistructured DB with an associated similarity enhanced ontology, • we define a selection condition to be well-typed if X and Y have a least common supertype and there exists a function to convert their types to . • we define (1) the type and value of a term w.r.t. a mapping h, and (2) the satisfaction of a selection condition • We extend the following algebraic operations: selection, projection, product, union, intersection, difference.
Implementation and Experiments • TOSS system implemented in Java • built on top of Xindice DBMS • Experiments: • Recall and precision • Scalability • selection • join
Recall and Precision • =TAX • X = TOSS (=2) • + = TOSS (=3)
Quality of Answers • =TAX • X = TOSS (=2) • + = TOSS (=3) • Quality =
Related Work • Wiederhold et al. [ICOT’ 94, EDBT’00,…] • ontology algebra (LISP-style logical statements) • IC (interoperation constraints) are not considered • A similar concept as IC is considered in EDBT’00, but their integrated ontologies were not concise. • Besides, we deal with XML documents.
Related Work • [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] • algebra to query XML documents • ontology is not used • [Al-Khalifa et al., Querying structured text in an XML database, in SIGMOD 2003] • IR-style query to find relevant results with weighting and ranking support in run-time • We use ontologies and similarity measures; we consider integration of ontologies and precompute SEO.
Questions and Answers Thank you very much!