1 / 10

Entity Search with NECESSITY

Entity Search with NECESSITY. 12th Workshop on Web and Databases ( WebDB ) Ekaterini Ioannou , Saket Sathe, Nicolas Bonvin , Anshul Jain, Srikanth Bondalapati , Gleb Skobeltsyn , Claudia Niederee , Zoltan Miklos L3S Hannover and EPFL Switzerland. Providing unique identifiers.

sook
Download Presentation

Entity Search with NECESSITY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entity Search with NECESSITY • 12th Workshop on Web and Databases (WebDB) • EkateriniIoannou, Saket Sathe, Nicolas Bonvin, Anshul Jain, SrikanthBondalapati, GlebSkobeltsyn, Claudia Niederee, ZoltanMiklos • L3S Hannover and EPFL Switzerland

  2. Providing unique identifiers Okkamization Entities Webpages Documents (Information extraction) Query: name=“Einstein” physicist Entity Store Response: http://www.okkam.org/ens/idb3016709

  3. Entities and Entity Requests • Entities are collection of attribute-value pairs with an okkam-id • Examples of entity requests • Q1 -- name= “Einstein” (AND) physicist • Q2 -- Einstein (AND) physicist • Q3 -- name= “Einstein” (AND) profession= “physicist” • name : Albert Einstein • affiliation : Institute of Advanced Study • profession : physicist • okkam-id : http://www.okkam.org/ens/id06b1791f

  4. Identified Challenges Challenge: The number of entities could be huge • Store and retrieve using IR based techniques • Matching on very large datasets • “narrow” down the result-set to a more tractable matching candidates Challenge: A single algorithm for fine-grained entity matching may not exist • Use a range of matching modules • Matching using relationships and without schema information. • Explicitly defined by user/application

  5. NECESSITY Setup • Approx. 1 Million entities extracted and indexed • People and organizations from Wikipedia • Locations from Geonames • Proteins form UniProt • Software Architecture • Lucene for handling inverted index • Solr for index distribution and load balancing • Hbase (Voldemort) for storing entity profiles

  6. NECESSITY Search Process name=“Einstein" AND physicist Matching Modules Product Matching OKKAM Match API Module Selection: Entity Type Inferred from attributes Identified from receiver Required response time … Group Linkage Generic Matching Receive the entity request Convert request and select matching module

  7. NECESSITY Search Process OKKAM Store Index name=“Einstein” AND physicist OKKAM Store API Top-k matches (IDs + scores) Top-k entities (candidates) • Each server processes the query from the index and returns top-k results • boost popular attributes • boost attributes specified by the query Aggregate top-k results from each server Send the query to index Query the distributed index Return top-k entities with scores

  8. NECESSITY Search Process • Background knowledge • Domain specific information • Analyze inner-relationships • Make another query • … name=“Einstein" AND physicist OKKAM Match API Matching Module … Receive matching candidates Advanced matching and final entities

  9. NECESSITY Search Process • Background knowledge • Domain specific information • Analyze inner-relationships • Make another query • … name=“Einstein" AND physicist OKKAM Match API Matching Module X X X X X X X X X X X X X … Ranked list with matching entities 0.95 0.89

  10. Thank You!

More Related