1 / 27

Entity Search Are you searching for what you want?

Entity Search Are you searching for what you want?. Kevin C. Chang Joint work with : Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe Kelley, Tao Cheng, Bill Davis, Mitesh Patel, Dave Killian. Let’s start with the new universal greeting….

Download Presentation

Entity Search Are you searching for what you want?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entity SearchAre you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe Kelley, Tao Cheng, Bill Davis, Mitesh Patel, Dave Killian

  2. Let’s start with the new universal greeting… What have you been reading lately? What have you been searching lately?

  3. From the MetaQuerier to WISDM:I am becoming superficial… Kevin’s 2 projects in the 4-quardants: Surface Web Deep Web   Access   Structure

  4. First Question: Where is U. of Illinois? Can we search it?

  5. What have you been searching lately? • The university and area of Kevin Chang? • The email of Marc Snir? • Customer service phone number of Amazon? • What profs are doing databases at UIUC? • The papers and presentations of ICDE 2007? • Due date of SIGMOD 2007? • Sale price of “Canon PowerShot A400”? • “Hamlet” books available at bookstores?

  6. Challenge of the surface Web: Despite all the glorious search engines… Are we searching for what we want?

  7. What you search is not what you want.

  8. Function follows view: What is “the Web”? Or: How do search engines view the Web?

  9. They say: Web is a corpus of PAGES.

  10. We take an entity view of the Web:

  11. What is an “entity”? Your target of information– or, anything. • Phone number • Email address • PDF • Image • Person name • Book title, author, … • Price (of something)

  12. From pages to entities Traditional Search Entity Search

  13. Demo. We build Ver. 0.1, to understand the promises and issues. • Three scenarios: • Academic: CS sites, DBLP homepages. • ECommerce: Books, Cellphones. • Yellowpage: Comprehensive corpus.

  14. Special Thanks: Data from Stanford WebBase.

  15. results: ranked list of (<prof, univ, research>, ) Example application: Question answering Q:Who are DB profs at UIUC? A:Geneva Belford, Kevin C. Chang, AnHan Doan, Jiawei Han, Marianne Winslett , ChengXiang Zhai Filtering & Validation query: #dtf-nnuw100(#entity(professor) #entity(university) #entity(research Database Systems, Data Mining, IR)) Querying Query Generation WISDM

  16. results: ranked list of (<prof, phone, email>, ) Example application: Relation construction prof phone email David DeWitt 608-263-5489 dewitt@cs.wisc.edu <prof, phone, email> Marianne Winslett 333-3536 winslett@cs.uiuc.edu … … … … … … Relation Construction query: #tf-nnow50(#entity(professor) #tf-nnuw20(#entity(email) #entity(phone))) Querying tagging: #entity(prof) App-specific Entity Tagging WISDM

  17. Example application: Best-effort integration Buy.com: $ $10.99, Amazon.com: $12.00 … … Price of “Hamlet”? Validation & Ranking results: ranked list of (<title, price>, ) query: #od50(#entity(title Hamlet) #entity(price)) Querying Query Generation WISDM

  18. How different is “entity search”? How to define such searches?

  19. Why is Entity Search different… • Probabilistic entities • v.s. A page is for sure a page. • Contextual patterns • v.s. Match a page by its content. • Holistic Aggregates • v.s. A page occurs only once. • Associative results • v.s. We never search for pairs of pages.

  20. Consider the entire process: Page Retrieval 4. Output: one page per result. 3. Scope: Each page itself. 2. Criteria: content keywords. Marc Snir Marc Snir 1. Input: pages.

  21. Entity search is thus different… Entity Search 4. Output: associative results. 3. Scope: holistic aggregates. 2. Criteria: contextual patterns. 1. Input: probabilistic entities.

  22. What are technical challenges? Or, how to write (reviewer-friendly) papers?

  23. Issue #1. EntityRank: How to rank entities? Say, Jiawei Han with #email, #phone, #researcharea • Entity matters • Is “jhan@” an email? Is “2-3457” a phone? • Context matters: • Order, distance • Frequency matters: • How often is Jiawei Han – “data mining”? • Associativity matters: • “webmaster@cs.uiuc.edu” • “algorithm” • Source matters: • Where did you get this info from?

  24. w v Issue #2: Query Processing: How to optimize? Q: #tf-nnow50(#entity(professor[David DeWitt]) fax #entity(phone)) tf gphone nnow50 sprof=“…” #entity(professor) “fax”-#entity(phone) (pre-materialized context index)

  25. Search Integration Mining Conclusion: One step at a time towards … What You Search Is What You Want! surface deep

  26. Thank You! And the warriors behind … ShuiLung Chuang Zhen Zhang Chengkai Li Govind Kabra Tao Cheng Arpit Jain Amit Behal David Killian Quoc Le Hanna Zhong Ngoc Bui Sonia Jahid Aniruddh Nath Paul Yuan Sung-Eun Kim Raj Sodhi Yuping Tseng Hemanta Maji

  27. Thank You! And the warriors behind … ShuiLung Chuang Zhen Zhang Chengkai Li Govind Kabra Tao Cheng Arpit Jain Amit Behal David Killian Quoc Le Hanna Zhong Ngoc Bui Sonia Jahid Aniruddh Nath Paul Yuan Sung-Eun Kim Raj Sodhi Yuping Tseng Hemanta Maji

More Related