10 likes | 166 Views
The Chinese University of Hong Kong. Web Page Classification with Heterogeneous Data Fusion. Zenglin Xu, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong { zlxu, king, lyu }@cse.cuhk.edu.hk. 1. Motivations. 2. Contributions.
E N D
The Chinese University of Hong Kong Web Page Classification with Heterogeneous Data Fusion Zenglin Xu, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong {zlxu,king, lyu}@cse.cuhk.edu.hk 1 Motivations 2 Contributions • For web page classification, there are many available data • sources, such as the text, the title, the meta data, the anchor • text, etc. • Simply putting them together would not greatly enhance the • classification performance. • Different dimensions and types of data sources can be • represented into a common format of kernel matrix. • A kernel learning approach is thus proposed to integrate • multiple data sources • A systematic way of integrating multiple • data sources. • Better classification accuracy. 3 Architacture & Model • 1. Feature Extraction. • 2. Similarity Representation. Each data source is represented as a kernel matrix (Ki) • 3. Similarity Combination. • 4. Classification. • Substitute K into the dual SVM • We have the following QCQP problem: • where αis the parameter of dual SVMs,δ is a constant and t is the trace vector. 4 Experiment results • Dataset: DMOZ • AT: Anchor Text • LT: Link Text • MT: Meta Data • TI: Title • PT: Plain Text • UW: Universally Weighted sources • KC: sources by Kernel Combination • Mi -F1: Micro-F1 • Ma-F1: Macro-F1 The Chinese University of Hong Kong WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.