390 likes | 521 Views
Information Extraction from Multimedia Content on the Social Web. Stefan Siersdorfer L3S Research Centre, Hannover, Germany. Meta Data and Visual Data on the Social Web. Meta Data: Tags Title Descriptions Timestamps Geo-Tags Comments Numerical Ratings Users and Social Links
E N D
Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany
Meta Data and Visual Data on the Social Web • Meta Data: • Tags • Title Descriptions • Timestamps • Geo-Tags • Comments • Numerical Ratings • Users and Social Links • Visual Data: • Photos • Videos How to exploit combined information from visual data and meta data?
tag1 User 2 User 1 tag2 tag3 Video 1 User 3 Video 2 Group 2 Video 3 Social Web Environments as Graph Structure • Entities (Nodes): • Rescources (Videos, Photos) • Users • Tags • Groups • Relationships (Edges): • User-User: Contacts, Friendship • User-Resources: Ownership, Favorite Assignment, Rating • User-Groups: Membership • Resource-Resource: visual similarity, meta data similarity
User Feedback on the Social Web • Numeric Ratings, Favorite Assignments • Comments • Clicks/Views • Contacts, Friendships • Community Tagging • Blog Entries • Upload of Content How can exploit the community feedback?
Outline • Part 1: Photos on the Social Web • 1.1) Photo Attractiveness • 1.2) Generating Photo Maps • 1.3) Sentiment in Photos • Part 2: Videos on the Social Web • Video Tagging
1.1) Photo Attractiveness * * Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain
Attractiveness of Images Which factors influence the human perception of attractiveness? Landscape Portrait Flower 10
Attractiveness Visual Features Human visual perception mainly influenced by Color distribution Coarseness These are complex concepts Convey multiple orthogonal aspects Necessity to consider different low level features 11
Attractiveness Visual Features Color Features Brightness Contrast Luminance, RGB Colorfulness Naturalness Saturation Mean, Variance Intensity of the colors Saturation is 0 for grey scale images 12
Visual Features Coarseness Resolution + Acutance Sharpness Critical importance for final appearance of photos [Savakis 2000] 13
Textual Features We consider user generated meta data Correlation of topics with image appealing (ground truth: favorite assignments) Tags seem appropriate to capture this information
#views #comments #favorites ... Attractiveness of Photos • Community-based models for classifying/ranking images according to their appeal. [WWW´09] Flickr Photo Stream Inputs Community Feedback (photo’s interestingness) Classification & Regression Attractiveness Models Generator Content (visual features) Metadata (textual features) cat, fence, house
Experiments 17
1.2) Generating Photo Maps * *Work and illustrations from David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg,Mapping the World's Photos, 18th International World Wide Web Conference, WWW 2009, Madrid, Spain
Outline: Photos maps • Use geo-location, tags, and visual features of photos to • Identify popular locations and landmarks • Find out location of photos • Estimate representative images
eiffel Spatial Clustering louvre paris tatemodern london trafalgarsquare Each data point corresponds to (longitude,latidue) of an image Mean shift clustering is applied to get hierarchical structure Most distinctive popular tags are used as labels(# photos tag in cluster/ # photos with tag in overall set)
Estimating Location of Photos without tags • Train SVMs on Clusters • Positive Examples: Photos in Clusters • Negative Examples: Photos outside the Cluster • Feature Representation • Tags • Visual features (SIFT) • Best Performance for Combination of Tags and SIFT features
Finding Representative Images • Construct Weighted Graph: • Weight based on visual similarity of images (using SIFT features) • Use Graph Clustering (e.g. spectral clustering) to identify tightly connected components • Choose image from this connected component
1.2) Sentiment in Photos * * Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web18th ACM Multimedia Conference (MM 2010), Florence, Italy
Sentiment Analysis of Images Data: more than 500,000 Flickr Photos • Image Features • Global Color Histogram: a color is present in the image • Local Color Histogram: a color is present at a particular location • SIFT Visual Terms: b/w patterns rotated and scaled • Image Sentiment • SentiWordNet: provides sentiment values for terms • e.g. (pos, neg, obj) = (0.875, 0.0 , 0.125) for term „good“ • used for obtaining sentiment categories training set + ground truth for experiments
Which are the most discriminative visual terms? • Use Mutual Information Measure to determine these features: • Probabilities (estimated through counting in image corpus): • P(t): Probability that visual term t occurs in image • P(c): Probability that image has sentiment category c („pos“ or „neg“) • P(t,c): Prob. that image is in category c and has visual term t • Intuition: „Terms that have high co-occurence with a category are more characteristic for that category.“
Most Discriminative Features Most discriminative visual features: Extracted using the Mutual Information measure [ACM MM’11]
Part 2: Videos on the Social Web * * Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011 Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy32nd ACM SIGIR Conference, Boston, USA, 2009
Near-duplicate Video Content • Youtube: most important video sharing environment • [SIGCOM’07]: 85 M videos, 65 k videos/day, 100 M downloads per day, Traffic to/from Youtube = 10% / 20% of the Web total • Redundancy: 25% of the videos are near duplicates Can we use reduandancy to obtain richer video annotations? Automatic tagging
Automatic Tagging • What is it good for? • Additional information Better user experience • Richer feature vectors for ... • Automatic data organization (classification and clustering) • Video Search • Knowledge Extraction ( creating ontologies)
Overlap Graph Video 1 Video 2 Video 3 Video 4 Video 5 Video 1 Video 2 Video 5 Video 3 Video 4
A B C A E B E F Neighbor-based Tagging (1): Idea Video 1 Video 2 Video 3 • Video 4 contains original tags A, B; tags F,E are obtained from neighbors • Criteria for automatic tagging: • Prefer tags used by many neighbors • Prefer tags from neighbors with a strong link A B F E Video 4 automatically generated
Neighbor-based Tagging (2): Formal Weights correspond to overlap Indicator function Sum over all neighbors
Neighbor-based Tagging (3) • Apply additional smoothing for redundant regions Overlap Region Number of neighbors with tag t Smoothing factor Subsets of neighbors
TagRank • Takes also transitive relationships into account • PageRank-like weight propagation
Applications of Extended Tag Respresentation • Use relevancies rel( t, vi) for constructing enriched feature vectorsfor videos: combine original tags with new tags weighted by relevance values • automatic annotation: use thresholding to select most relevant tags for a given videos • Manual assessment of tags show their relavance • Data organization: • Clustering and Classification experiments (Ground truth: Youtube categories of videos) • Improved performance through enriched feature representation
Summary • Social Web contains visual information (photos, videos) and meta data (tags, time stamps, social links, spatial information, ..) • A large variety of users provide explicit and implict feedback in social web environments (ratings, views, favorite assignments, comments, content of uploaded material) • Visual Information & annotations can be combined to obtain enhanced feature representations • Visual information can help to establish links between resources such as videos (application: information propagation) • Feature representations in combination with community feedback can be used for machine learning (appliciation: classification, mapping).
References • Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011 • Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web • 18th ACM Multimedia Conference (MM 2010), Florence, Italy • Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy32nd ACM SIGIR Conference, Boston, USA, 2009 • Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain • David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg • Mapping the World's Photos • 18th International World Wide Web Conference, WWW 2009, Madrid, Spain