1 / 13

CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005

CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005. A Presentation on Searching the Workplace Web R. Fagin, R. Kumar, K. McCurley, J. Novak, D. Sivakumar, J. Tomlin & D. Williamson WWW2003, Budapest, Hungary by Osama Ahmed Khan 11/03/2005 (It’s my birthday! ). Problem.

kami
Download Presentation

CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 450 – Web Mining SeminarProfessor Brian D. DavisonFall 2005 A Presentation on Searching the Workplace Web R. Fagin, R. Kumar, K. McCurley, J. Novak, D. Sivakumar, J. Tomlin & D. Williamson WWW2003, Budapest, Hungary by Osama Ahmed Khan 11/03/2005 (It’s my birthday! )

  2. Problem • Intranet Search vs. Internet Search Solution • A case study of IBM’s intranet Definition • Intranet: Corporate network similar and dissimilar to the Internet at the same time

  3. Internet • Democratic: Reflects collective voice of many authors • Interesting Content: Attracting user traffic (Axiom 1) • Targets various ‘Best Answers’ (Axiom 2) • Spam-influenced: Various authorities contributing (Axiom 3) • Search-engine-friendly (Axiom 4)

  4. Intranet • Autocratic: Reflects the view of the entity that it serves • Informative Content (Axiom 1) • Targets a single ‘Right Answer’ (Axiom 2) • Spam-free: Small number of authorities for building content (Axiom 3) • Search engine: Bad idea (Axiom 4)

  5. Two-phase Approach • Identify a variety of ranking functions based on heuristic and experimental analysis of intranet structure • Rank Aggregation Architecture

  6. IBM’s Dataset • Unbiased: May apply to other organizations

  7. System Architecture • Crawler: Stores and produces structured data • Duplicate Elimination: Favorite representative from a group of similar pages • Inverted Indexing: 3 indices (Content, Title, Anchortext) • Global Ranking: 7 static lists (PageRank, Indegree, Discovery date, URL words, URL length, URL depth, Discriminator) • Query Runtime System • Result Markup and Presentation

  8. Rank Aggregation Architecture • Input: Multiple ranked lists from various heuristics • Output: Final ranked list minimizing the total ‘inversions’ with respect to the individual ranked lists • Plug-and-Play: Allows addition and removal of individual heuristics

  9. Rank Aggregation Architecture (Contd.)

  10. Experimental Results

  11. Experimental Results (Contd.)

  12. Conclusion • Intranet and Internet possess different structures • Separating ranking functions helps select a combination of best heuristics

  13. Thank You

More Related