190 likes | 293 Views
Searching for Logo and Trademark Images on the Web. Euripides G.M. Petrakis * Epimenidis Voutsakis * Evangelos Milios ** * Technical University of Crete, Chania, Greece ** Dalhousie University, Halifax, Canada. Retrieval of Logo & Trademarks.
E N D
Searching for Logo and Trademark Images on the Web Euripides G.M. Petrakis* Epimenidis Voutsakis* Evangelos Milios** *Technical University of Crete, Chania, Greece **Dalhousie University, Halifax, Canada
Retrieval of Logo & Trademarks • Important characteristic signs of corporate Web sites and of products presented there • Comprise 32,6% of total number of images on the Web • Retrieval of logo & trademarks is of significant commercial interest • Eg. Detection of unauthorized usage http://www.intelligence.tuc.gr/intellisearch
Image Retrieval on the Web • Text queries: keywords, free text • Answers: images in Web pages with similar text • Images not always relevant or relevant but not important • Important: From corporate web sites, organizations • Less important: From individuals and small companies • Link analysis: assign higher ranking to answers from important web sites http://www.intelligence.tuc.gr/intellisearch
Contributions • Enhance accuracy of retrievals • Support queries by image example • Preference to images from important Web sites • Evaluation of state-of-art methods • Retrieval by Text • Retrieval by Image content • Retrieval by importance • Combination of the above http://www.intelligence.tuc.gr/intellisearch
Image Content Representation • Text surrounding images in Web pages • Image filename, Alternate text, Page title, Caption • Image features computed on Intensity & Energy histograms • Mean & Variance on histograms • Moment invariants on raw images • Count of number of distinct intensity levels http://www.intelligence.tuc.gr/intellisearch
Histograms • Intensity Spectrum: distribution of intensity values • Energy Spectrum: distribution of average energy over co-centric rings on DFT http://www.intelligence.tuc.gr/intellisearch
Logo & trademark Detection • Distinguish from images of other categories • Small images • Few intensity levels • Rich frequency content • Image features form vectors which are used to train a decision tree • Accuracy: 85% • Each image is a assigned a probability of being logo or trademark • Retrieval gives more emphasis to images with high logo-trademark probability http://www.intelligence.tuc.gr/intellisearch
Logo-Trademark Similarity • Simage-similarity(Q,D) = Sfeatures + Stext • Sfeatures= Smoment-invariants + Sintensity-histogram + Senergy-histogram • Stext= Simage-caption + Sfile-name + Salt-text + Spage-title http://www.intelligence.tuc.gr/intellisearch
Image Retrieval by Text • Compute text similarity between Image and Query text descriptions using Vector Space Model (VSM) • Text is represented by vectors of tf.idf term weights • Q=(q1,q2,…qN) , D=(d1,d2,…dN) • Similarity http://www.intelligence.tuc.gr/intellisearch
Retrieval by Image features • The similarity between histograms is computed by their inter intersection • The similarity between moment invariants is computed as vector similarity http://www.intelligence.tuc.gr/intellisearch
Link Analysis • Assign importance to Web pages, images • Main idea: co-cited and co-contained images are likely to be related • PageRank and HITS for text retrieval • PicASHOW for Web pages with images using links alone • WPicASHOW handles image and text content in queries and Web pages http://www.intelligence.tuc.gr/intellisearch
Focused graph F • Retrieve initial set F of images • Stop images (banners, buttons) are filtered out • Non-logo/trademarks are filtered out (based on probability) • Expand F with pages pointing to images in F • Expand F with pages and images pointed to by pages in F • Repeat until F sufficiently large http://www.intelligence.tuc.gr/intellisearch
Example of Focused Graph http://www.intelligence.tuc.gr/intellisearch
WPicASHOW • Create the focused graph F • Weighted links: image similarity between Queries are Images is used for regulating the influence of links in F • Authorities: principal eigenvector of [(W+I)MT](W+I)M • W: page to page relationships in F • M: page to image relationships in F • Answers: Rank answers by authority (eigen)value http://www.intelligence.tuc.gr/intellisearch
Evaluation • Database assembled locally by crawler • 1,5M pages with images • Text queries: VSM, PicASHOW, WPicASHOW • Image queries (example image + text): VSM, WPicASHOW • Average Precision/Recall on top 30 answers http://www.intelligence.tuc.gr/intellisearch
Text Queries http://www.intelligence.tuc.gr/intellisearch
Queries by text and image http://www.intelligence.tuc.gr/intellisearch
Conclusions • VSM: Relevant but not always important answers • PicASHOW retrieves important but not always relevant answers • WPicASHOW: good compromise between relevance and importance • The size of the data set is a problem http://www.intelligence.tuc.gr/intellisearch
Web Implementation • Try the system at http://www.intelligence.tuc.gr/intellisearch • Selection of retrieval method • Link analysis methods • And more.. http://www.intelligence.tuc.gr/intellisearch