400 likes | 544 Views
Commentary-based Video Categorization and Concept Discovery. By Janice Leung. Agenda. Introduction to Video Sharing Sites Current Problem Previous Works Commentary-based Video Clustering Conclusion Future Works. Video Sharing Sites. Allows users to upload videos Shares videos worldwide
E N D
Commentary-based Video Categorization and Concept Discovery By Janice Leung
Agenda • Introduction to Video Sharing Sites • Current Problem • Previous Works • Commentary-based Video Clustering • Conclusion • Future Works
Video Sharing Sites • Allows users to upload videos • Shares videos worldwide • Example: • Dailymotion • YouTube • MySpace
De Facto • YouTube • More than 65,000 new videos every day • 100 million videos views daily • 20 million unique visitors per month
Immense amount of videos Incredible growth of videos How to search for desired video? YouTube: Tags + simple Categorization
YouTube • Predefined categories • Videos • Title • Description • Tags • Category • Comments Provided by the one who uploads the video Provided by many users
Related Works • Classify videos: • Video features: color, grayscale histogram, pixel information • Keywords from description • Tags • Find user interests: • Object fetching information • Tags
Problems • Video features • Cannot tell exactly what the video is about • No users interest is considered • Keywords from description • Description provided by the one who uploaded the video • Not sufficient information
Problems (Cont.) • Tags • Not sufficient information • May reflect users feelings on videos but too brief to represent the complex idea of the videos • Object fetching information • Reflects users interests but no information about the videos at all
Video Categorization and Concept Discovery • Site: YouTube • Videos: involving Hong Kong singers
Comments Given by many users Can be large amount Express users opinions Rich words describe fine-grained level ideas Tags Given by only one person (the one who uploaded the video) Few tags Describe the video in a very brief way Singer name Song name Comment vs Tag
Comments • Include: • Video content • Music styles • Music ages • Singer description • Appearance • Style • News etc.
Commentary-based Video Categorization • Objective: Categories videos based on user interests and discover the concept of videos • Cluster videos by using comments • Group videos based on user interests • Find video concepts • Clustering algorithm: multi-assignment NMF
Video clustering • Bi-clustering: videos and words • Clusters videos and words into k groups by matrix factorization • Video-word matrix X as input • Video-word matrix X is derived by tf-idf
Tf-idf • Term frequency (tf) • Suppose there are t distinct terms in document j where fi,jis the number of occurrence of term i in document j
Tf-idf (Cont.) • Inverse document frequency where N is the total number of documents in dataset and ni is number of documents containing term i
Tf-idf (Cont.) • Importance weight of term i to document j • Matrix X as input to NMF is defined as
Video Clustering (Cont.) • Decompose X into non-negative matrices W and H by minimizing where Ref. : Document Clustering Based On Non-negative Matrix Factorization (Xu et al SIGIR’03)
Video Clustering (Cont.) NMF decomposition for video clustering
Video Clustering (Cont.) • Suppose • Number of videos: N • Number of distinct terms: M • Threshold: β • W in size M x K • wn,k: coefficient indicates how video n belongs to cluster k
Video-cluster assignment • Videos can belongs to multiple groups • Multi-cluster assignment • Video n belongs to cluster k if • Set of clusters that video n belongs to: where K is set if all clusters
Video-cluster assignment (Cont.) • Threshold, β • Many irrelevant videos for each cluster • Coefficient distribution varies for different clusters • Coefficient distribution dependant • Different for different clusters
Concept Discovery • Matrix H in size of K x M • hk,m: how likely term m belongs to cluster k • Term belongs to a cluster describes the videos in that cluster • Concept words of cluster k videos • Top 10 words of cluster k
Experiment • 19305 videos • 102 Hong Kong singers • 7271 users • Number of cluster, k: 20
Experiment (Cont.) • Threshold, β • Coefficient distribution dependant • Threshold for cluster i is defined as
Experiment (Cont.) • Video coefficients may distribute in an extremely uneven manner • Cause poor result • To compensate, threshold can be set as
Experiment (Cont.) C1 C2 C3 V1 0.65 0.65 0.23 0.23 0.12 0.12 V2 0.35 0.35 0.64 0.64 0.01 0.01 V3 0.65 0.65 0.22 0.22 0.13 0.13 V4 0.05 0.05 0.64 0.64 0.31 0.31 V5 0.00 0.00 0.30 0.30 0.70 0.70
Concept Words vs Tags Percentage of videos with tags covering concept words across groups
Singer Relationship Discovery • Comments on videos may talk about singers • Singer styles, appearance, news • Singer clustering using comments • Reveals relationships between singers • Discovers hidden phenomenon
Conclusion • Captures user interests more accurately and fairly than that of the human predefined categories • Categories can be changed dynamically, user interest changes from time to time • Obtain clusters with fine-grained level ideas • Ease the task of video search by categorizing videos and refining index
Future Works • Extend to user clustering • Obtain relationships videos, singers and users of the entire social network • Study the social culture • Ease the job of advertising to target customers • Connect people who share the same interests
Q & A Questions?