250 likes | 267 Views
Learn about BANKS system for keyword-based searching in relational databases, including interface, algorithms, query models, and performance evaluation. Enhance your database searching experience with advanced techniques.
E N D
Keyword Searching and Browsing in Databases using BANKS GauravBhalotiaArvindHulgeriCharutaNakhe SoumenChakrabarti S. Sudarshan 18th International Conference on Data Engineering (ICDE'02), 2002 KushalBansal
Outline • Introduction • Database and Query Model • Searching for the Best Answers • Interface and Templates of BANK System • Experiment and Performance • Conclusion
Introduction • Web Search engines make use of unstructured queries • Users have to type in keywords and follow hyperlinks • Relational databases use structured query languages like SQL • Users need to know the schema of the database • Difficult for naïve users • For data stored in databases, keyword based techniques is not much useful • Data often splits across the tables due to normalization
Introduction • BANKS (Browsing And Keyword Searching) • It is a system which provides search engine type interface to search and browse relational databases. • Allows interaction with controls on the displayed results. • No query language or programming required.
Outline • Introduction • Database and Query Model • Informal Model • Formal Model • Query and Answer Model • Searching for the Best Answers • Interface and Templates of BANK System • Experiment and Performance • Conclusion
Database and Query Model Informal Model Each database is modeled as a directed graph Each tuple in the database is modeled as a node in the graph. Every Primary – Foreign key relation is modeled as a directed edge.
Database and Query Model Informal Model 4. An answer to a query is a subgraph connecting nodes matching the keywords. 5. The importance of a link depends upon the relations it connects and on its semantics
Database and Query Model • The Schema
Database and Query Model • Fragment of the Database
Database and Query Model Formal Database Model • Node Weight • Each node u in the graph is assigned a weight N(u) • Node weight is also known as the node prestige • N (u) = Indegree of the node • Node score N = Root node weight + Sum of leaf node weights
Database and Query Model • Formal Database Model • Edge Weights • Weight of the directed edge (u,v) given by • (u,v) exists but (v,u) does not = s (R(u), R(v)) • (v,u) exists but (u,v) does not = IN(u) s (R(v),R(u)) • If both exists = min [ s(R(u),R(v)), IN(u) s (R(v),R(u)) ]
Database and Query Model • Formal Database Model • Edge Weights • Escore(e) of an edge = w(e)/w min • Escore overall = 1/ (1 + ∑ Escore(e)) • Escore overall is in the range [0,1]
Database and Query Model Formal Database Model • Overall relevance score = Node weights + Edge Weight • Using weighting factor • Additive: (1- ) E + N • multiplicative: E * N
Database and Query Model • Query and Answer Model • Query • Query consists of search terms t1 ,t2, ……tn • For each term ti we find set of nodes Si that are relevant to ti S = {S1,S2,…Sn} • Answer Model • An answer to a query is a rooted directed tree connecting keyword nodes • Relevance score of an answer tree • Relevance scores of it nodes and its edge weight
Database and Query Model • Result of query “soumen and sunita”
Outline • Introduction • Database and Query Model • Searching for the best answers • Backward expanding search algorithm • Interface and Templates of BANKS • Experiment and Performance • Conclusion
Searching for the Best Answer • Backward expanding search algorithm • Assumes that the graph of the database fits in memory • Starts at leaf nodes each containing a query keyword • Run concurrent single source shortest path algorithm from each such node • Traverses the graph edges in reverse direction • Common vertex along the backward paths identify answer tree roots • Tree formed is a connection tree and root of tree is information node.
Outline • Introduction • Database and Query Model • Searching for the best answers • Interface and Templates of BANKS • Experiment and Performance • Conclusion
Interface • BANKS system provides • A rich interface to browse data stored in a relational database • Schema browsing and data browsing • Hyperlink to the referenced tuple • Columns can be projected away (dropped) • Selections can be imposed on any column • Tuples can be sorted by a specified column
Templates • BANKS system provides several predefined templates • Cross – tabs • Group by • Folder Views • Graphical Interface for display in bar, line or pie chart
Outline • Introduction • Database and Query Model • Searching for the best answers • Interface and Templates of BANKS • Experiment and Performance • Conclusion
Experiment and Performance • Computed absolute value of the rank difference of the ideal answer and answer for each parameter setting. • Sum of the rank differences gives the raw Error score • Setting = 0.2 with log scaling of edge weights did best, with an error score of 0.0
Outline • Introduction • Database and Query Model • Searching for the best answers • Interface and Templates of BANKS • Experiment and Performance • Conclusion
Conclusion • BANKS system • Provides an integrated browsing and keyword querying system for relational databases • Allows users with no knowledge of database systems or schema to query and browse relational database with ease • Reduces the effort involved in publishing relational data on the web and makes it searchable.