280 likes | 371 Views
Cooperative Query Answering for Semistructured Data. Speakers: Chuan Lin & Xi Zhang. By Michael Barg and Raymond K. Wong. Outline. Motivations Overview Basic Concepts Cooperative Query Processing Experiment. Motivations. XML data same semantic content very different structures.
E N D
Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong
Outline • Motivations • Overview • Basic Concepts • Cooperative Query Processing • Experiment
Motivations • XML data • same semantic content • very different structures
Court Transcript: plaintiff User Query: woman “insurance claims” related to “smoking” for “woman” insurance claim smoking Insurance Record: insurance claim insurer smoking woman Example: same semantics, diff structures
Data: personnel User Query: “phone number” of “Bob” Who is the new “sales manager” sales manager salesman assistant sales manager salesman Joe Bob phone number phone number Motivations • No exact query result
Overview • Goal: • Return approximate answers for XML queries • “approximate”: semantic + structural similar • Solution: • Return a set of results • ranked by an overall score • score: indicates how well the subgraph containing the result satisfies the query criteria.
Basic Concepts: Query Tree Query: /restaurant[.//Soho]/phone_number Query Tree: Result Term restaurant t h h soho t phone_number r For each edge: “head”: the end which is closer to nearest result term “end”: the other end In case of tie, “head” is the end closer to root
Basic Concepts: Converging Order • Order of edges considered in query processing • Converge on a result term
shopping_ center restaurant soho soho restaurant restaurant soho soho restaurant eating_ places address restaurant soho (a) (b) (c) (d) (e) Basic Concepts: Similarity • Semantically similar topologies
Basic Concepts: Similarity (cont.) • Deviation Proximity (DP) • Measure how far one structure deviates from a desired structure • Given: • ra: data node with value a • rb: data node with value b • Q(a,b): query tree edge • DP: the actual position of rb to the nearest position, r’b, which satisfies the topological relationship specified by Q(a,b) • Topological relationship: parent-child, ancestor-descendent
restaurant soho soho eating_ places restaurant Deviation Proximity Q (restaurant, soho) requires parent-child relationship shopping_ center restaurant soho restaurant soho address restaurant (soho’) (soho’) soho (soho’) (soho’) (soho’) DP(restauarent, soho): 0 1 2 3 3
restaurant soho soho eating_ places restaurant Deviation Proximity Q (restaurant, soho) requires anc-desc relationship shopping_ center restaurant soho restaurant soho address restaurant (soho’) soho (soho’) (soho’) (soho’) (soho’) DP(restauarent, soho): 0 0 2 3 3
Cooperative Query Processing • Input: a Query Tree QT, an XML Document Tree DT • Output: ordered list of <rresult_term, score> • Cooperative Query Processing • Structural proximity calculation • Progressive Score
Cooperative Query Processing (cont.) • Progressively matching edges in QT with DT • Consider edges in converging order • For each edge QT(a,b), where a is head and b is tail, get a list of <ra, score> • ra is a node in DT with value a • score is the progressive score of ra w.r.t the nearest rb • use graph encoding to calculate structural proximity of ra and rb
Structural Proximity Calculation • Encodings and Compressed Arrays • Compact • Preserve relationship to a larger graph • Facilitate distance calculations • Proximity Searching
Encodings and Compressed Arrays • Basic Concepts: • Common Node • Terminal Node • Annotated Node • Path representation • Representing Single Path • Representing Multiple Paths • Representing Multiple Elements • Compressed Arrays • Each encoding is a path/muti-path for a node/a set of nodes
Representing Single Path 1.1.1 y1 1.2.1.1.1.1 y2
Representing Multiple Paths 1.3 B .B.2.1.1 C .3 C .C.2 y3
Representing Multiple Elements .A.1.1y1 1 A .2.1.1.1.1 y2 .3 B.B.2.1.1 C.3 C.C.2 y3
Drawback of Encoding • 1A.A.1B.B.1D.2E.?.2C.C.1F.2G
Proximity Searching • Multi-Element Comparison • Input: • A compressed array, caN, containing the multi-element encoding of the Near Set. • A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. • output: • dist, the shortest path from EF to the closest element in Near Set
Proximity Searching MinDist=5 MinDist = 4 MinDist = 2
Progressive Score • Accumulative Deviation Proximity (DP) • Calculated from structural proximity • Boolean operator at Query Tree branches a a b b c c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))
Experiment XML: Query: //restaurant/soho Query Result: <soho, 2> <soho, 3> <soho, 4>