A Novel Approach for Entity Linkage

A Novel Approach for Entity Linkage IEEE-IRI2009, Las Vegas 2009-08-11 Heiko Stoermer, Paolo Bouquet University of Trento, Italy This work is co-funded by the European Commission in the context of the Large-scale Integrated project OKKAM (GA 215032)

Outline • Part 1: Background and Context • Part 2: Problem, Approach, Implementation, Results IEEE-IRI2009, Las Vegas

Web 2.0 seen from Outer Space Billions of people who create and share information and content producers (Web2.0)‏ Intelligent (semantic-driven) mash-ups based and its use in new complex and ubiquitous services IEEE-IRI2009, Las Vegas

BUT However ... IEEE-IRI2009, Las Vegas

Flood of Identifiers http://www.reuters.com/news/globalcoverage/barackobama http://www.OPENCALAIS.com/watch?v=z4W2_raF_iw http://en.wikipedia.org/wiki/Barack_obama ?? http://www.facebook.com/home.php#/barackobama?ref=s http://dbpedia.org/resource/Barack_Obama http://farm4.static.flickr.com/3193/2437394249_824e76ed76.jpg?v=0 http://www.linkedin.com/in/barackobama IEEE-IRI2009, Las Vegas

Too many identifiers for the same thing out there … … not much used in content production … and poorly interlinked How do I find out what Web users have to say about our product XYZ? How can I avoid advertising restaurants in Venice (FL) for a query about Venice (IT)? How do we collect distributed information about a specific customer or project in a complex Intranet environment? In short: how can we enable mash-ups based on: select * from Web where ID=”…” on the Web of Data or in an enterprise-wide Intranet? The Flood of Identifiers IEEE-IRI2009, Las Vegas

Our Wish for The Web X.0 ... IEEE-IRI2009, Las Vegas

A Possible Solution -> An Entity Name System for the (Semantic) Web APIs • Open, decentralized service • Provides IDs for annotating any content in any application • Supports reuse of IDs • Maps ID schemas onto each other • Based on HTTP IEEE-IRI2009, Las Vegas

The ENS – A large „phonebook“ • Input: • a simple search query • a reference record • Output: a re-usable entity identifier • Under the hood: • large-scale entity repository • pre-populated • collaboratively growing • entity matching architecture IEEE-IRI2009, Las Vegas

ENS Overview IEEE-IRI2009, Las Vegas

Part 2 IEEE-IRI2009, Las Vegas

Entity Matching • Related work under different names: merge-purge, record linkage, deduplication, entity consolidation, entity linkage... • New aspects: • unknown entity representation • unknown query representation • multi-linguality • Our problem: • answer an entity search query with high top-1 success rate in very short time IEEE-IRI2009, Las Vegas

Bottom-up Study • We asked about 250 individuals from all over the world which feature names they would use to describe a certain set of entity types • Key result • „name“ feature shared between all analyzed types • „name“ feature with very high relevance for all analyzed types IEEE-IRI2009, Las Vegas

Name-feature based Entity Similarity IEEE-IRI2009, Las Vegas

Avoiding „Spam“ • Example: • Q={q1, q2} • E={e1,e2,e3} • Establish fsim() for every pair (q,e) • Select only maximum similar pairs • Build final score between Q and E IEEE-IRI2009, Las Vegas

Benchmark based on 67 example queries ~ 590k entities Top-1 improvement of ~12% over reference algorithm No performance penalty Results IEEE-IRI2009, Las Vegas

Future Work • Improved similarity measure based on a knowledge model inferred from our study • Evaluation in the context of the 2009 Ontology Matching Contest (entity track) IEEE-IRI2009, Las Vegas

Thank You! Contact stoermer@disi.unitn.it if you are interested in using the ENS in your experiments/projects/solutions.  IEEE-IRI2009, Las Vegas

A Novel Approach for Entity Linkage

A Novel Approach for Entity Linkage

Presentation Transcript

A NOVEL APPROACH TO FOOT DIPPING

A Novel Approach to Measuring Interventricular Dependence

Record Linkage: A Database Approach

Cancer Vaccines: A novel approach to cancer

End-of-Life Care: A Novel Approach

Introductory Chemistry: A Novel Approach

A Novel Approach to Lactate Sensing

A novel approach to modeling

Welcome to “Reading: A Novel Approach”

A novel approach For the treatment of Bad breath

A novel approach in CSP with GA

NADEF 2009: A Novel Approach

A Novel Approach to Event Duration Prediction

A Novel Approach for Transparent Bandwidth Conservation

A NOVEL APPROACH: RUBBER DAMS FOR BETTER WATERSHEDS MANAGEMENT

A Novel Approach To Search For Dominant Risk Factors

Resort Bedroom Furniture - A Novel Approach

A Novel Approach for Progressive Duplicate Detection for Quality Assurance

Entity Relationship Approach

IHE - A Novel Approach IHE Methodology

Organic Antioxidation a novel approach.. ….QUADRACARE

A Novel Approach Against Type 1 Diabetes