80 likes | 203 Views
Clustering Documents in a Web Directory. Presenter : Shu-Ya Li Authors : Giordano Adami, Paolo Avesani, Diego Sona. WIDM 2003. Outline. Motivation Objective Methodology Experiments and Results Conclusion Personal Comments. Primates. Primates. Monkey. Apes. Apes. Monkey.
E N D
Clustering Documentsin a Web Directory Presenter : Shu-Ya Li Authors : Giordano Adami, Paolo Avesani, Diego Sona WIDM 2003
Outline • Motivation • Objective • Methodology • Experiments and Results • Conclusion • Personal Comments
Primates Primates Monkey Apes Apes Monkey Gorillas Chimpanzees Gorillas Chimpanzees Motivation • Bootstrapping a huge hierarchy with a proper set of labeled examples is a critical issue. • Bootstrapping • Automatic annotation of labeled taxonomies with flat sets of data, helping the user to design his own data structures; • The user can then remove wrongly distributed documents Bootstrapping • 從Web中產生候選的文件 • 分類候選文件 • 透過專家過濾分錯類的文件
Objectives • This paperaimed at the development of a supporting tool that allows to reduce the human effort required while annotating a taxonomy with examples. • To overcome with the bootstrapping problem, such as the standard prototype-based classifier • baseline approach • the “constrained” K-means approach
1 1 1 1 1 Methodology - TaxSOM • Encode all documents in the data set as fixed size and normalized vectors (frequencies of words in a vocabulary) • Initial weights for nodes (models) are randomly chosen forcing the presence of the labels (e.g. max frequency) • Start learning iteratively updating weights. node labels compare pattern codebooks
Conclusion • We proposed the TaxSOM model, which improves the baselineand K-meansperformance by explicitly including the taxonomy knowledge into the model.
Personal Comments • Advantage • … • Drawback • … • Application • Web Directory