1 / 34

Finding High-Quality Content in Social Media

Finding High-Quality Content in Social Media. c henwq 2011/11/26. Authors. Eugene Agichtein Emory University Research: Intelligent Information Access Lab ( IRLab ) News:our team wins the "Best Paper" award at SIGIR 2011. . Abstract.

storm
Download Presentation

Finding High-Quality Content in Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding High-Quality Content in Social Media chenwq 2011/11/26

  2. Authors Eugene Agichtein Emory University Research: Intelligent Information Access Lab (IRLab) News:our team wins the "Best Paper" award at SIGIR 2011.

  3. Abstract • From the early 2000s,user-generated content has become popular on the web.Thequality of user-generated content varies drastically from excellent to abuse and spam. • To separate high-quality content from the rest automatically • Graph-based framework • combine the different sources of evidence in a classification formulation

  4. Contents 1 Related work 2 CONTENT QUALITY ANALYSIS 3 MODELING CONTENT QUALITY 4 EXPERIMENT & Conclusion

  5. Related work • Link analysis in social media • Propagating reputation • Question/answering portals and forums • Expert finding • Text analysis for content quality • Implicit feedback for ranking

  6. Related work • Link analysis in social media • G = (V, E) • V corresponding to the users of a question/answer system • a directed edge e = (u, v) ∈ E from a user u ∈ V to a user v ∈ V if user u has answered to at least one question of user v • G’ = (V, E’) • PageRank,ExpertiseRank, HITS

  7. Contents 1 Related work 2 CONTENT QUALITY ANALYSIS 3 MODELING CONTENT QUALITY 4 EXPERIMENT & Conclusion

  8. CONTENT QUALITY ANALYSIS ——Intrinsic content quality • As a baseline, we use textual features only—with all word n-grams up to length 5 that appear in the collection more than 3 times used as featuresusers

  9. CONTENT QUALITY ANALYSIS ——Intrinsic content quality Punctuation and typos Syntactic and semantic Grammaticality Punctuation Capitalization Spacing density Character-level entropy Spelling mistakes Out-of-vocabulary words • Average number of syllables per word • Entropy of word lengths • Readability measures • Part-of-speech sequences • Formality score • Distance between its (trigram) language model and several given language models

  10. CONTENT QUALITY ANALYSIS ——User relationships • items and users Graph • user-user Graph u q answer u u has answered a question from user v v

  11. CONTENT QUALITY ANALYSIS——Usage statistics • The number of clicks on some item • The dwell time on some item

  12. CONTENT QUALITY ANALYSIS ——classification framework • We cast the problem of quality ranking as a binary classification • support vector machines • log-linear classifiers • stochastic gradient boosted trees • Our goal is to discover interesting,well for-mulated and factually accurate content

  13. Contents 1 Related work 2 CONTENT QUALITY ANALYSIS 3 MODELING CONTENT QUALITY 4 EXPERIMENT & Conclusion

  14. MODELING CONTENT QUALITY ——user relationships • Our dataset, viewed as a graph as illustrated in Figure 1

  15. MODELING CONTENT QUALITY ——user relationships • The relationships between questions, users asking and answering questions, and answers can be captured by a tripartite graph outlined in Figure 2

  16. MODELING CONTENT QUALITY ——user relationships • the unique characteristics of the community question/answering domain

  17. MODELING CONTENT QUALITY ——user relationships • Question subtree • Q Features from the question being answered • QU Features from the asker of the question being answered • QA Features from the other answers to the same question

  18. MODELING CONTENT QUALITY ——user relationships • User subtree • UA Features from the answers of the user • UQ Features from the questions of the user • UV Features from the votes of the user • UQA Features from answers received to the user’s questions • U Other user-based features

  19. MODELING CONTENT QUALITY ——user relationships • Question features

  20. MODELING CONTENT QUALITY ——user relationships • Implicit user-user relations • G = (V,E) • E = Ea∪Eb∪Ev∪Es∪E+∪E− • Gx= (V,Ex) • hx the vector of hub scores on the vertices V • ax the vector of authority scores • pxthe vector of PageRank scores • p´x the vector of PageRank scores in the transposed graph

  21. MODELING CONTENT QUALITY ——user relationships • Implicit user-user relations

  22. MODELING CONTENT QUALITY ——user relationships • Content features for QA • to identify the most salient features for the specific tasks of question or answer quality classification • the KL-divergence between the language models of the two texts • their non-stopwordoverlap • the ratio between their lengths

  23. MODELING CONTENT QUALITY ——user relationships • Usage features for QA • number of item views (clicks) • Metadata of question • how long ago the question was posted • derived statistics • the expected number of views for a given category • the deviation from the expected number of views • other second-order statistics • the click frequency

  24. Contents 1 Related work 2 CONTENT QUALITY ANALYSIS 3 MODELING CONTENT QUALITY 4 EXPERIMENT & Conclusion

  25. Experiment & Conclusions ——EXPERIMENTAL SETTING • Dataset Edgesinduced from the whole dataset.

  26. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  27. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  28. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  29. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  30. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  31. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  32. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  33. MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING • Dataset statistics

  34. Thanks for attention!

More Related