300 likes | 391 Views
Research on Enterprise Track of TREC 2007. Huizhong Duan, Qi Zhou, Zhen Lu, Ou Jin, Shenghua Bao, Yunbo Cao and Yong Yu Apex Knowledge & Data Management Lab. Presenter: Yangbo Zhu. Document Search. Outline. Static Ranking Approaches. Link Sparse; Similar, Small Rank. HostRank Algorithm.
E N D
Research on Enterprise Track of TREC 2007 Huizhong Duan, Qi Zhou, Zhen Lu, Ou Jin, Shenghua Bao, Yunbo Cao and Yong Yu Apex Knowledge & Data Management Lab Presenter: Yangbo Zhu
Static Ranking Approaches Link Sparse; Similar, Small Rank.
Calculate the Host’s Importance http://www.csiro.au/science http://www.atnf.csiro.au www.atnf.csiro.au/~rgooch http://www.ento.csiro.au
Propagation of the Host’s Importance • Hierarchical Weight Structure www.atnf.csiro.au/computing www.atnf.csiro.au/computing/software www.atnf.csiro.au/computing/software/smongo
Propagation of the Host’s Importance • The factor ω is defined as: • Index(p) is a boolean value denoting whether the page is an index page. • Link(P) is define as the percentage of the inlinks of Page P Reference: G. Xue, Q. Yang, H. Zeng, Y. Yu, Z. Chen: Exploiting the Hierarchical Structure for Link Analysis. In: Proceedings of SIGIR2005
Title Extraction Title H1 H2 H1
Body Detection • Dividing the page based on DOM tree structure.
Body Detection • Filtering divided parts
Outline S. Bao, H. Duan, Q. Zhou, M. Xiong, Y. Cao and Y. Yu: Research on Expert Search at Enterprise Track of TREC 2006. In: proceedings of 15th Text Retrieval Conference (TREC 2006), 2006.
Topic Sensitive ExpertRank Topic Sensitive Expert Rank
Parsing Corpus for Expert Name List • Some anti-spam format
Parsing Corpus for Expert Name List zywang@noble.org Emails with single letter in its person name part No.10@csiro.au publishing.emu@csiro.au a.scott@dem.csiro.au
VisualPageRank and Expert Homepage Detection • VisualPageRank • Too simple: Too complicated:
VisualPageRank and Expert Homepage Detection • Example of Expert Homepage