340 likes | 594 Views
SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS. Chen LIN * , Jiang-Ming YANG + , Rui CAI + , Xin-jing WANG + , Wei WANG * , Lei ZHANG + * Fudan University + Microsoft Research Asia. OUTLINE. Motivation
E N D
SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui CAI +, Xin-jing WANG +, Wei WANG *, Lei ZHANG + *Fudan University +Microsoft Research Asia
OUTLINE • Motivation • Challenges • Model • Application • Reply reconstruction • Junk post detection • Expert finding • Experiments • Conclusion
THREADED DISCUSSIONS Chat rooms IMs root reply Mailing lists Web forums
MINING SEMANTICS & STRUCTURE Junk Identification Expert Search Measure post quality …
Semantics & Structure CHALLENGE Junk Post Post Quality
SEMANTIC & STRUCTURE Semantic: Topics Structure: Who reply to who
Semantics & Structure CHALLENGE Junk Post Post Quality
Semantics & Structure CHALLENGE Junk Post Post Quality
POST QUALITY valuable post
MODEL • Purpose: Simultaneously modeling • semantics • Structures • Methodology • Intuitive • Matrix based • Sparse coding root reply
SEMANTIC REPRESENTATION OF THREAD Project posts to topic space D X Θ • Minimize:
A POST IS RELATED TO PREVIOUS POSTS approximate each post as linear combination of previous posts Minimize Θ b:
A POST IS RELATED TO A FEW TOPICS government cobol
SPARSE SEMANTICS OF POST D X Θ • Minimize:
A POST IS RELATED TO A FEW POSTS Minimize approximate each post as linear combination of previous posts Θ b: Sparse
OPTIMIZE THEM TOGETHER Model semantic Model structure
APPLICATIONS • Reply reconstruction • Capability of recognizing structure • Junk identification • Capability of capturing semantics • Expert finding • Capability of measuring post quality
REPLY RECONSTRUCTION Document Similarity Topic Similarity Structure Similarity
DATA SET Slashdot Apple discussion
BASELINES • NP • Reply to Nearest Post • RR • Reply to Root • DS • Document Similarity • LDA • Latent Dirichlet Allocation • Project documents to topic space • SWB • Special Words Topic Model with Background distribution • Project documents to topic and junk topic space
, JUNK IDENTIFICATION • D= • X = • Θ = • Probability of junk
DATA SET Slashdot Apple discussion
BASELINES • DF • SVM • Classify posts as junk posts & non-junk posts • SWB • Special Words Topic Model with Background distribution • Project documents to topic and junk topic space
BASELINES • LM • Formal Models for Expert Finding in Enterprise Corpora. SIGIR 06 • Achieves stable performance in expert finding task using a language model • PageRank • Benchmark nodal ranking method • HITS • Find hub nodes and authority node • EABIF • Personalized Recommendation Driven by Information Flow. SIGIR ’06 • Find most influential node
EVALUATION • Bayesian estimate
DISCUSSION • Parameters vs. Model Complexity • Linear regression • SMSS model Though the number of parameters is increased, the projection space is shrunk by the prior knowledge. Prior knowledge Prior knowledge
CONCLUSION • Purpose • Mine the semantics • Mine the structure • Highlight • Simultaneously model the • Semantic • Structure • Applications are designed to evaluate the model • Reply reconstruction • Junk identification • Expert Finding