10 likes | 133 Views
WebSummarization. Automatic Logo and Trademark Extraction from Web Sites. Evdoxios Baratis, Euripides G.M. Petrakis Dept. of Electr. & Comp. Engin., Technical Univ. of Crete (TUC), Greece Evangelos E. Milios Faculty of Comp. Science, Dalhousie University, Hallifax, Canada.
E N D
WebSummarization Automatic Logo and Trademark Extraction from Web Sites Evdoxios Baratis, Euripides G.M. Petrakis Dept. of Electr. & Comp. Engin., Technical Univ. of Crete (TUC), Greece Evangelos E. Milios Faculty of Comp. Science, Dalhousie University, Hallifax, Canada WebSummarizationis a system for extracting the most important images from a Web site • Logos and Trademarksare important features, characterizing the identity of corporate web sites or of products presented in such sites • http://www.intelligence.tuc.gr/websummarization 1. Logo/Trademark Extraction based on Image Features • Features computed on Intensity and Frequency histograms • Machine learning for discriminating logos/trademarks from images of other categories • For an image its Prob. of being Logo/Trademark is computed 2. Clustering of similar images • Similar logo/trademark appear more than once in web sites • Identical of similar images are grouped together into clusters • Clustering is based on features and machine learning 3. Image Ranking:Select the most characteristic images from a Web Site, • An image is selected from each cluster • Image importance=Prob . Depth . Instances • Probability is the logo probability of an image • Instances is the number of appearances of an image • Depth= (MaxDepth + 1 – LinkDepth)/MaxDepth 4. Web Summary: the k most important clusters • Cluster importance = Σi ImageImportance Department Of Electronic & Computer Engineering Technical University of Crete (TUC)