190 likes | 342 Views
Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval. Presenter: Andy Lim. Paper Topic. Folksonomy Social media s haring p latforms. The Problem. Rise in popularity of social image and video sharing platforms Precision of tag-based media retrieval Tags are
E N D
Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval Presenter: Andy Lim
Paper Topic • Folksonomy • Social media sharing platforms
The Problem • Rise in popularity of social image and video sharing platforms • Precision of tag-based media retrieval • Tags are • Noisy • Ambiguous • Incomplete • Subjective • Lack of constraints • Free-text tags (i.e. “djfja;sldfkj”) Tags: hotdog, chinese, trololol, aidjishi, sandwich, bread
Previous Research(Internal) • Improving tag relevance • Sigurbjornsson and Zwol • Developed a method of recommending a set of relevant tags based on tag popularity • Li et al. • List all images for a given tag and determine tag relevance from visual similarity • All are confined to noisy tags within the primary dataset
The Approach • Internal vs. External • Leverage external auxiliary sources of information to improve target tagging systems (presumably much noisier) • Exploit disparate characteristics of target domain using auxiliary source • Note: What is the optimal level of joint modeling such that the target domain still benefits from the auxiliary source?
Assumptions • There is a common underlying subspace shared by the primary and secondary domains • The primary domain is much nosier than the secondary domains
Nonnegative Matrix Factorization • X (M x N data matrix) where N = documents in terms of M vocabulary words • F (M x R nonnegative matrix) represents R basis vectors • H (R x N nonnegative matrix) contains coordinates of each document
Joint Shared Nonnegative Matrix Factorization (JSNMF) • Input: • X (target domain), Y (auxiliary domain), R1 and R2 (dimensionality of underlying subspaces of X and Y), K (basis vectors) • Output: • W (joint shared subspace), U (remaining subspace in target domain), V (remaining subspace in auxiliary domain), H (coordinate matrix for target domain), L (coordinate matrix for auxiliary domain)
Retrieval using JSNMF • Input: W, U, H, query sentence SQ, number of images (or videos) to be retrieved N and image (or video) dataset • Output: Return top N retrieved images (or videos)
Experiment • Use LabelMe tags (auxiliary) to improve • Image retrieval in Flickr • Video retrieval in Youtube • Why LabelMe? • Object image tagging • Controlled vocabulary
Flickr Dataset • Downloaded 50,000 images from Flickr • Average number of distinct tags = 8 • Removed • Rare tags (appears less than 5 times) • Images with no tags and non-English tags • Obtained 20,000 labeled images • 7,000 examples are kept for investigating internal auxiliary dataset
YouTube Dataset • Downloaded 18,000 videos’ metadata (tags, URL, category, title, comments, etc.) • Average number of distinct tags = 7 • Removed • Rare tags (appearing less than 2 times) • Videos with no tags or non-English tags • Obtained dataset corresponding to 12,000 videos • Again, kept 7,000 examples to be used as an internal auxiliary dataset
LabelMe Dataset • Added 7,000 images with tags from LabelMe • Average number of distinct tags = 32 • Removed • Rare tags (appearing less than 2 times) • Cleanup does not reduce dataset
Evaluation Measures • Defined query set Q • {cloud, man, street, water, road, leg, table, plant, girl, drawer, lamp, bed, cable, bus, pole, laptop, plate, kitchen, river, pool, flower} • Manually annotated the two datasets (Flickr and YouTube) with respect to the query set (no benchmark dataset available) • Query term and an image is relevant if the concept is clearly visible in the image (or video)
Results with JSNMF • Precision-Scope Curve • Fix recall at 0.1 • Users are usually only interested in first few results • 10% improvement
Results with JSNMF • Under-representation • Shares very few basis vectors • Over-representation • Forces many basis vectors to represent both datasets • Appropriate level of representation
Flickr Retrieval Results • Results are better with LabelMe • As recall increases, precision decreases • When K=0 (no sharing) or K=40 (fully sharing), precision is lower compared to K=15
YouTube Retrieval Results • Similar to Flickr Results
Extra Notes & Questions? • Can be extended to multiple datasets (not just 2) • Can use generic model to apply to other data mining tasks