270 likes | 286 Views
Entity Search Are you searching for what you want?. Kevin C. Chang Joint work with : Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe Kelley, Tao Cheng, Bill Davis, Mitesh Patel, Dave Killian. Let’s start with the new universal greeting….
E N D
Entity SearchAre you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe Kelley, Tao Cheng, Bill Davis, Mitesh Patel, Dave Killian
Let’s start with the new universal greeting… What have you been reading lately? What have you been searching lately?
From the MetaQuerier to WISDM:I am becoming superficial… Kevin’s 2 projects in the 4-quardants: Surface Web Deep Web Access Structure
First Question: Where is U. of Illinois? Can we search it?
What have you been searching lately? • The university and area of Kevin Chang? • The email of Marc Snir? • Customer service phone number of Amazon? • What profs are doing databases at UIUC? • The papers and presentations of ICDE 2007? • Due date of SIGMOD 2007? • Sale price of “Canon PowerShot A400”? • “Hamlet” books available at bookstores?
Challenge of the surface Web: Despite all the glorious search engines… Are we searching for what we want?
Function follows view: What is “the Web”? Or: How do search engines view the Web?
What is an “entity”? Your target of information– or, anything. • Phone number • Email address • PDF • Image • Person name • Book title, author, … • Price (of something)
From pages to entities Traditional Search Entity Search
Demo. We build Ver. 0.1, to understand the promises and issues. • Three scenarios: • Academic: CS sites, DBLP homepages. • ECommerce: Books, Cellphones. • Yellowpage: Comprehensive corpus.
results: ranked list of (<prof, univ, research>, ) Example application: Question answering Q:Who are DB profs at UIUC? A:Geneva Belford, Kevin C. Chang, AnHan Doan, Jiawei Han, Marianne Winslett , ChengXiang Zhai Filtering & Validation query: #dtf-nnuw100(#entity(professor) #entity(university) #entity(research Database Systems, Data Mining, IR)) Querying Query Generation WISDM
results: ranked list of (<prof, phone, email>, ) Example application: Relation construction prof phone email David DeWitt 608-263-5489 dewitt@cs.wisc.edu <prof, phone, email> Marianne Winslett 333-3536 winslett@cs.uiuc.edu … … … … … … Relation Construction query: #tf-nnow50(#entity(professor) #tf-nnuw20(#entity(email) #entity(phone))) Querying tagging: #entity(prof) App-specific Entity Tagging WISDM
Example application: Best-effort integration Buy.com: $ $10.99, Amazon.com: $12.00 … … Price of “Hamlet”? Validation & Ranking results: ranked list of (<title, price>, ) query: #od50(#entity(title Hamlet) #entity(price)) Querying Query Generation WISDM
How different is “entity search”? How to define such searches?
Why is Entity Search different… • Probabilistic entities • v.s. A page is for sure a page. • Contextual patterns • v.s. Match a page by its content. • Holistic Aggregates • v.s. A page occurs only once. • Associative results • v.s. We never search for pairs of pages.
Consider the entire process: Page Retrieval 4. Output: one page per result. 3. Scope: Each page itself. 2. Criteria: content keywords. Marc Snir Marc Snir 1. Input: pages.
Entity search is thus different… Entity Search 4. Output: associative results. 3. Scope: holistic aggregates. 2. Criteria: contextual patterns. 1. Input: probabilistic entities.
What are technical challenges? Or, how to write (reviewer-friendly) papers?
Issue #1. EntityRank: How to rank entities? Say, Jiawei Han with #email, #phone, #researcharea • Entity matters • Is “jhan@” an email? Is “2-3457” a phone? • Context matters: • Order, distance • Frequency matters: • How often is Jiawei Han – “data mining”? • Associativity matters: • “webmaster@cs.uiuc.edu” • “algorithm” • Source matters: • Where did you get this info from?
w v Issue #2: Query Processing: How to optimize? Q: #tf-nnow50(#entity(professor[David DeWitt]) fax #entity(phone)) tf gphone nnow50 sprof=“…” #entity(professor) “fax”-#entity(phone) (pre-materialized context index)
Search Integration Mining Conclusion: One step at a time towards … What You Search Is What You Want! surface deep
Thank You! And the warriors behind … ShuiLung Chuang Zhen Zhang Chengkai Li Govind Kabra Tao Cheng Arpit Jain Amit Behal David Killian Quoc Le Hanna Zhong Ngoc Bui Sonia Jahid Aniruddh Nath Paul Yuan Sung-Eun Kim Raj Sodhi Yuping Tseng Hemanta Maji
Thank You! And the warriors behind … ShuiLung Chuang Zhen Zhang Chengkai Li Govind Kabra Tao Cheng Arpit Jain Amit Behal David Killian Quoc Le Hanna Zhong Ngoc Bui Sonia Jahid Aniruddh Nath Paul Yuan Sung-Eun Kim Raj Sodhi Yuping Tseng Hemanta Maji