150 likes | 227 Views
EasyQuerier: A Keyword Interface in Web Database Integration System. Xian Li 1 , Weiyi Meng 2 , Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton. Traditional Integrated Interface. Domain list. Manually. Q. Manually. Integrated interface of Job. What does EasyQuerier look like.
E N D
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li1, Weiyi Meng2, Xiaofeng Meng1 1 WAMDM Lab, RUC & 2 SUNY Binghamton
Traditional Integrated Interface Domain list Manually Q Manually Integrated interface of Job
What does EasyQuerier look like • EasyQuerier Q Manually Automatically Q …… Automatically Q Integrated interface of Job
New Features of EasyQuerier • Automatically domain mapping • User do not need to select domain from long list • More flexible Keyword Query • Different kinds of data type • Text, numeric, currency, date • More logic relation covered • “and”, “or”, “between…and” • Q1: New York or Washington, education, $2000-$3000 • U1={New York, Washington}, logic: or • U2={education} • U3={$2000, $3000}, logic: range • Automatically query translation
EasyQuerier: overview • Part 1: Domain Map • Collect the domain knowledge from candidate domains • Similarity based domain mapping strategy • Part 2: Query translation • Partially Keyword-attribute map • Holistically Keyword-attribute map
Challenge 1: Domain Mapping • Problem statement • Map a user query to the correct domain automatically without domain information to be separately entered. • Our solution • Domain representation model • Term weight assignment • Query-domain similarity
Domain mapping(1) • Domain representation model • D =< d_ID; CT; AT; V T > • d_ID: unique domain identifier. • CT = {cti|i=1,2,…} is a set of Conceptual Terms, which describe the whole domain concept • AT =∪A∈D DAL(d_ID, Ai) is a set of Attribute Label Terms consisting of attribute labels of the products in this domain • InteLabel, LocalLabel, OtherLabel • VT = ∪A∈D DAV(d_ID, Ai) is a set of the Value Terms associated with the products’ attributes in the domain • Text Attribute: inteValue, LocalValue, Other Value • Non-text Attribute: VT can be characterized by the pre-defined ranges available on the integrated interfaces.
Domain mapping(2) • Different terms have different ability to differentiate the domains. • “price” is less powerful than “title” in differentiating the book from others • Term weight assignment • Adopt idea of CVV, used to measure the skew of the distribution of terms across all document databases • Ifijmeans how many times tjappears in either AT or VT in DiCVVjas the CVV for tj • Weight(Di tj) = CVVj * ifij.
Domain mapping(3) • Q = {u1, u2, …, un}, ui ={vi1, vi2, …} • Q1 example • U1= {New York, Washington}, vi1={New York}, vi2= {Washington} • For each term tj in VT or AT we only record the most matching term tj • =
Challenge 2: Query translation • Problem statement • Translate the query to the integrated interface • Just like filling the integrated interface with a set of keywords • Computation model • Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A). • Def 4.2 (Degree of Matching (DM)). For each KAM is has a matching degree. • Def 4.3 (Query Translation Solution (QTS)) A QTS represents a strategy of filling in the query interface. A QTS is comprised of severalKAMs. • Def 4.4 (Conviction) This measurement determines whether a QTS is reasonable. The larger the DM of a KAM, the more reasonable the KAM is. Such KAMs combined together will generate optimal QTS
Query translation(1) • Computation of DM • For Q = {u1, u2, …, un}, ui ={vi1, vi2, …} , Sim(vxi, Aj) is the maximum value of all Sim(vxi,tj) • Where the tj in the VT of Aj , Sim(vxi,tj) (same as domain map)
Query translation(2) • Conviction • Conviction value of a QTS is a weighted sum of the DMs of the related KAMs • Why weight? • If an attribute appears in more local interfaces of a domain, it is more important in the domain. • weight w(Aj) for each attribute Ajbased on its interface frequency ifi • For an attribute within the domain D
Experiment • Settings • 9 domains, each covers 50 web databases • 10 students, 20 keyword queries for each domain • Measurement • Correct/acceptable/wrong • Overall/with domain/with attribute label/value only Fig2: query translation accuracy Fig1: domain mapping accuracy
Conclusion • In this paper, we proposed a novel keyword based interface system EasyQuerier for ordinary users to query structured data in various Web databases. • We developed solutions to two technical challenges • map keyword query to appropriate domains • translate the keyword query to a query for the integrated search interface of the domain