100 likes | 192 Views
Entity Search with NECESSITY. 12th Workshop on Web and Databases ( WebDB ) Ekaterini Ioannou , Saket Sathe, Nicolas Bonvin , Anshul Jain, Srikanth Bondalapati , Gleb Skobeltsyn , Claudia Niederee , Zoltan Miklos L3S Hannover and EPFL Switzerland. Providing unique identifiers.
E N D
Entity Search with NECESSITY • 12th Workshop on Web and Databases (WebDB) • EkateriniIoannou, Saket Sathe, Nicolas Bonvin, Anshul Jain, SrikanthBondalapati, GlebSkobeltsyn, Claudia Niederee, ZoltanMiklos • L3S Hannover and EPFL Switzerland
Providing unique identifiers Okkamization Entities Webpages Documents (Information extraction) Query: name=“Einstein” physicist Entity Store Response: http://www.okkam.org/ens/idb3016709
Entities and Entity Requests • Entities are collection of attribute-value pairs with an okkam-id • Examples of entity requests • Q1 -- name= “Einstein” (AND) physicist • Q2 -- Einstein (AND) physicist • Q3 -- name= “Einstein” (AND) profession= “physicist” • name : Albert Einstein • affiliation : Institute of Advanced Study • profession : physicist • okkam-id : http://www.okkam.org/ens/id06b1791f
Identified Challenges Challenge: The number of entities could be huge • Store and retrieve using IR based techniques • Matching on very large datasets • “narrow” down the result-set to a more tractable matching candidates Challenge: A single algorithm for fine-grained entity matching may not exist • Use a range of matching modules • Matching using relationships and without schema information. • Explicitly defined by user/application
NECESSITY Setup • Approx. 1 Million entities extracted and indexed • People and organizations from Wikipedia • Locations from Geonames • Proteins form UniProt • Software Architecture • Lucene for handling inverted index • Solr for index distribution and load balancing • Hbase (Voldemort) for storing entity profiles
NECESSITY Search Process name=“Einstein" AND physicist Matching Modules Product Matching OKKAM Match API Module Selection: Entity Type Inferred from attributes Identified from receiver Required response time … Group Linkage Generic Matching Receive the entity request Convert request and select matching module
NECESSITY Search Process OKKAM Store Index name=“Einstein” AND physicist OKKAM Store API Top-k matches (IDs + scores) Top-k entities (candidates) • Each server processes the query from the index and returns top-k results • boost popular attributes • boost attributes specified by the query Aggregate top-k results from each server Send the query to index Query the distributed index Return top-k entities with scores
NECESSITY Search Process • Background knowledge • Domain specific information • Analyze inner-relationships • Make another query • … name=“Einstein" AND physicist OKKAM Match API Matching Module … Receive matching candidates Advanced matching and final entities
NECESSITY Search Process • Background knowledge • Domain specific information • Analyze inner-relationships • Make another query • … name=“Einstein" AND physicist OKKAM Match API Matching Module X X X X X X X X X X X X X … Ranked list with matching entities 0.95 0.89