1 / 16

Efficient Keyword Search over DBLife & DBLP Data

Efficient Keyword Search over DBLife & DBLP Data. CS511 (Inprogress) Project Presentation, Dec-09-2005 Mayssam Sayyadian Nhung Nguyen Hieu Li. Introduction. DBLife: Manages Unstructured Data People are familiar with keyword searching unstructured data … but, DBLife  ER graph

barid
Download Presentation

Efficient Keyword Search over DBLife & DBLP Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Keyword Search over DBLife & DBLP Data CS511 (Inprogress) Project Presentation, Dec-09-2005 Mayssam Sayyadian Nhung Nguyen Hieu Li

  2. Introduction • DBLife: Manages Unstructured Data • People are familiar with keyword searching unstructured data • … but, DBLife  ER graph • Entities, mentions, etc. : structured data extracted • DBLP: Well known, available, enriched database of publications • DBLife does not cover all the data in DBLP

  3. Assumption • Data is in relational format, not XML • DBMS provides text indexing at column level • Oracle, SQL Server, DB2, MySql, PostgreSQL • Support for XML data is subject of future work

  4. Basic Model • Database: modeled as a graph • Nodes = tuples • Edges = references between tuples • foreign key, inclusion dependencies, .. • Edges are directed. eTuner: Tuning Schema … iMAP: Discovering … paper writes Mayssam Sayyadian AnHai Doan Pedro Domingos author

  5. Answer Example Query: Mayssam AnHai paper eTuner: Tuning Schema … writes writes author author Mayssam AnHai Doan

  6. Answer Model • Query: set of keywords {k1, k2, .., kn} • Each keyword ki matches set of nodes Si • Answer: rooted, directed tree connecting nodes, with one node from each Si • Root node (we call it an information node) has special significance, may be restricted to some relations • E.g. relations representing entities, not relationships • Multiple answers ranked by a scoring function

  7. Score of Result T • Combining function Score combines scores of attribute values of T • One reasonable choice: Score=aTScore(a)/size(T) • Attribute value scores Score(a)calculated using the DBMS's IR Index

  8. Implementation EasyDB Components JSPs Browser / Client Java Beans Java API Http DBLP JDBC Servlets Http Java API DBLife Web Server

  9. DBLP DBLP DBLife DBLife Searching over Multiple Databases: System Architecture Preprocessing: Offline Querying: Online User Index Builder Q IR Engine DBLife IR Index DBLP IR Index Tuplesets ForeignKey Joins Top-k Generator Join Discovery Schema Matching + SQL Queries Distributed SQL Query Processor

  10. Top-K Generator • Contributions: • Iterative Refinement Algorithm • A unifying framework to search for Top-K best tuple-trees • Cast previous algorithms into IRA • Improve them substantially

  11. IRA Framework • Concepts: • Abstract State, Concrete State, Score Interval • IRA Alg: branch and bound search 1. Abstraction: Create initial abstract states 2. While less than k states output, iteratively: (a) Evaluation: Update the score intervals (b) Elimination: Eliminate (prune) the space of states (c) Refinement: Select an abstract state and refine it (d) If the goal state (the top-1 state) is found: Output it and remove it.

  12. iteration 1 iteration 2 iteration 3 K = {P2, P3}, min score = 0.7 . . . . . . P1 [0.6, 0.8] P [0.6, 1] . P2 0.9 Res = {P2, R2} min score = 0.85 . . . Q [0.5, 0.7] . . . P3 0.7 R1 [0.4, 0.6] . . . . . . . R [0.4, 0.9] R [0.4, 0.9] R2 0.85 IRA - Example

  13. IRA Algorithms • Kite: straight forward adaptation of state of the art algorithm (hybrid) to IRA • aKite: adaptive Kite  able to change and adapt over time • daKite: adaptive Kite algorithm armed with more sophisticated refinement rules (read: more cost effective search heuristics)

  14. Preliminary Experiments • Currently experiments over DBLP data

  15. Future Work • Better UI & Browsing facilities • User feedback • Extend to handle XML data

  16. References • V. Hristidis, L. Gravano, Y. Papakonstantinou, “Efficient IR-Style Keyword Search over Relational Databases” • S. Agrawal, S. Chaudhuri, G Das, “DBXplorer: A System for Keyword Search over Relational Databases” • G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabati, “Keyword Searching and Browsing in Databases using BANKS”

More Related