UIUC People Finder

UIUC People Finder

Info University of Illinois at Urbana Champaign Advanced Database Management Systems CS511 Instructor ChengXiang Zhai Sena Lee (senalee2@uiuc.edu) Heewon Jung (hjung20@uiuc.edu) Seung Pyo Lee (slee232@uiuc.edu) Ricardo Redder (rredder2@uiuc.edu) John Laipple (laipple@uiuc.edu)

Agenda • Problem • Motivation • Common problem • Definition • Challenges • Solution • Implementation • Retrieval • Interpretation • Decision • Demo • Future work

Motivation • For a given a person • The information about the person stored in relational databases is very limited.e.g.: name, age, address, etc. • There is a lot of information about he or she in the internet.e.g.: web-pages, papers, blogs, pictures • Use the best of both worlds

Common problem ChengXiang Zhai Search

Phonebook ChengXiang Zhai Search

Google Images

Search engines

Entity retrieval • Given: • a set of entities E • a relational table where each tuple describes some aspects of an entity • a set of documents • A who is interested in an entity ei, pose a query (Q), and expects the tuple which represents ei, and the documents associated with ei.

Our example • Query = keywords (usually name) • Table = Phonebook • Documents = Results from search engines

Challenges • Semantic problem • It is different from finding a document that is mathematically similar to the query • It is subjective, the final target is in our mind, and it is not expressed by a function

Solving • Use the information from the relational database to improve the documents search • The information from the phonebook is reliable, it is very accurate • The search engines are more generic, a simple search for a name might not be useful.

Our example again ChengXiang Zhai Search

Sequence • User type a query • User click the Search button • Application searches in the Phonebook • Application retrieve the information from the Phonebook • Application searches in the search engines, using the previous information

Implementing the idea • How to retrieve the information and documents from web? • How to interpret the results? • How to decide whether a given document relates to the entity or not?

How to retrieve the information and documents from web?

Web-sites as functions • Search engines • User types the text • Click on the button • Read the results • Click on the results • UIUC People Finder • Application send the text to the search engine (1, 2) • Store the results (3, 4)

Using exposed HTTP interface • Search engines • Uses GET or POST methods to receive information • Send the results in HTML • Application • Convert the query to a GET or POST method, and send it • Read the HTML

Wrappers • Receive the text • Build the appropriate URL • Connect to the URL • Read the response Query text Wrapper HTML Example: http://www.google.com/search?hl=en&q=chengxiang+zhai&btnG=Google+Search

How to interpret the results?

HTML – good for humans

HTML – hard to computers

How do we interpret? • Visual language • Different styles  different meanings • Underline  Links • Useful information  Center

Extraction from HTML • HTML is Tag based < > • Different styles • <font size =…> • <h2> • <bgcolor =…> • Links • <a href = …> • Center • <body>

How to decide whether a given document relates to the entity or not?

How do we decide? • Look for related information • Context • Names • Other information

Application • Search for keywords found in the Phonebook. • Search for the name • Search for the department • Search for the address • etc. • Rank the pages • Name  +100 points • Departament  +50 points • Email  +250 points

Problem • Performance • Problem: Search engines return thousands, or millions of results • Solution: Limit the number of retrieved web-pages • Problem: Even limiting the number of analyzed web-pages, many pages are accessed • Solution: Cache

Final architecture www online Google Yahoo Phonebook Searchers Information Picture Documents cache Query text offline

Demo

Future work • Extend to other domains • MySpace, ACM, Papers, Blogs, etc… • Automatic link extraction • Better ranking function • User feedback • Owner feedback

Questions

Thank you

UIUC People Finder

UIUC People Finder

Presentation Transcript

EPRI People Finder

STRENGTHS FINDER

UIUC Technical Presentation

Strengths Finder

People Finder

Fan Finder

UIUC Strategic Plan

Steve Williamson UIUC

UIUC MURI Review

Computational Linguistics @ UIUC

VENUE FINDER

UIUC paper review

People Finder Process

Roommate Finder Platform To Bring Likeminded People Together

Connect With New People Through Friend Finder Platform

Mail Finder

Cannabis Businesses Finder - Cannabis Finder

The Ultimate Glossary of Terms About Online People Finder

Top 50 Funny people finder free Quotes

9 Guilt Free Online People Finder Tips