1 / 1

WWW 200 7 , May 8 – 12 , 200 7 , Banff , Alberta, Canada .

The Chinese University of Hong Kong. Web Page Classification with Heterogeneous Data Fusion. Zenglin Xu, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong { zlxu, king, lyu }@cse.cuhk.edu.hk. 1. Motivations. 2. Contributions.

zoltin
Download Presentation

WWW 200 7 , May 8 – 12 , 200 7 , Banff , Alberta, Canada .

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Chinese University of Hong Kong Web Page Classification with Heterogeneous Data Fusion Zenglin Xu, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong {zlxu,king, lyu}@cse.cuhk.edu.hk 1 Motivations 2 Contributions • For web page classification, there are many available data • sources, such as the text, the title, the meta data, the anchor • text, etc. • Simply putting them together would not greatly enhance the • classification performance. • Different dimensions and types of data sources can be • represented into a common format of kernel matrix. • A kernel learning approach is thus proposed to integrate • multiple data sources • A systematic way of integrating multiple • data sources. • Better classification accuracy. 3 Architacture & Model • 1. Feature Extraction. • 2. Similarity Representation. Each data source is represented as a kernel matrix (Ki) • 3. Similarity Combination. • 4. Classification. • Substitute K into the dual SVM • We have the following QCQP problem: • where αis the parameter of dual SVMs,δ is a constant and t is the trace vector. 4 Experiment results • Dataset: DMOZ • AT: Anchor Text • LT: Link Text • MT: Meta Data • TI: Title • PT: Plain Text • UW: Universally Weighted sources • KC: sources by Kernel Combination • Mi -F1: Micro-F1 • Ma-F1: Macro-F1 The Chinese University of Hong Kong WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.

More Related