270 likes | 402 Views
ACM Multimedia 2011. Zechao Li 1,2 , Meng Wang 3 , Jing Liu 1 , Changsheng Xu 1,2 and Hanqing Lu 1,2 1 Institute of Automation, Chinese Academy of Sciences 2 China-Singapore Institute of Digital Media 3 School of Computing National University of Singapore 29/11/2011.
E N D
ACM Multimedia 2011 Zechao Li1,2, Meng Wang3, Jing Liu1, Changsheng Xu1,2 and Hanqing Lu1,2 1Institute of Automation, Chinese Academy of Sciences 2China-Singapore Institute of Digital Media 3School of Computing National University of Singapore 29/11/2011 News Contextualization with Geographic and Visual Information National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
Outline • Motivation • News Contextualization • Location relevance analysis • Image enrichment • Evaluation • Discussion
Online News Reading – Popular News sites People: online reading
Online News Reading • Specific places • Hometown, working place, country • Incomprehensive visual information • Too few images
Our Solution Semi-Supervised Learning Multimodal Fusion News Contextualization with Geographic and Visual Information News Contextualization Features/Distribution/Boosting Post-Processing
Location Relevance Analysis News from web Wikipedia1 GeoNames2 Toponym Candidates Toponym Filtering & Expansion 1http://en.wikipedia.org/wiki/Main_Page 2http://www.geonames.org/ Location Relevance Analysis
Matrix Factorization Relevance Analysis • Basic idea • Where and What (news document) • Low rank • Factor model • Matrix factorization • Document similarity and toponym co-occurrence • Correlation Consistent Probabilistic Matrix Factorization (CCPMF) What Where Event Similarity Location Co-occurrence
CCPMF • Notation • Rij: the initial relation between locations and documents • I: is the indicator matrix • P: the latent location feature matrix • E: the latent document feature matrix • LC and LS: the Laplacian matrices of document graph and location graph
Image Enrichment Query Generation News Document Google Image Online Image Search Image Output
Query Generation Difficulties Our Solution
Queries Generation • Score terms in the title • Top c terms • L queries
Queries Generation • An example {Obama bids China farewell with Great Wall tour}
Image Mining & Selecting • How to find the appropriate images? • Score-based rank aggregation • Position & Visual similarity • Notation • h: the number of images in each list • k: the position of the i-th image in the j-th list • O: the set of original pictures
Image Mining & Selecting • How to determine the weights? • Manually label some groundtruth • Tune the weights to maximize NDCG@15 • Top r images, including the original pictures
Experiments • Data • ABC, BBC, CNN and Google News • 135,308 documents with 69,144 images • 4,742 locations • User Study • 30 persons, age 20-35 • two countries, frequently reading news online • NDCG • Very relevant, relevant, irrelevant: 2, 1, 0
Experiment I – Location Relevance Analysis • News Search: NDCG • BM25 • PMF4 • Parameters 4 R. Salakhutdinov and A.Mnih. "Probabilistic Matrix Factorization". NIPS 2008
Experiment II – Image Enrichment • Label 300 documents to train the weights • Compared method • Naïve Search: the whole title as a query • Naïve Fusion: each term in the title as a query
Experiment III – NewsMap • Compared with Yahoo News Map • Convenience • Efficiency • Usefulness • Score: [1, 5]
Conclusions • News browsing system: NewsMap • A novel matrix factorization to analyze the location relevance • Effective strategies to generate queries and intelligently fuse the results
Future work • Organize news with a topic discovery component • News recommendation • Extent CCPMF to other potential applications such as shopping and several local services.
News Ranking • Relevance, Timeliness and Importance • relevance: CCPMF • timeliness: ‘YYYYMMDD’ • importance: news similarity • PageRank
Experiment-News Ranking • PRT: only time information • PRR: only the relevance • BM25