280 likes | 426 Views
Using a Sentiment Map for Visualizing Credibility of News Sites on the Web. Yukiko Kawai*, Yusuke Fujita *, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology, Japan *** Kyoto University, Japan. Outline.
E N D
Using a Sentiment Mapfor Visualizing Credibility of News Sites on the Web Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology, Japan *** Kyoto University, Japan
Outline • Background • Research goal • System overview • Offline processing • Online processing • Experimental evaluation • Conclusion and future work
Background • To answer this question, I want toread some news to have an opinion about this topic. • Rapid spread of web news sites (e.g., MSN, GoogleNews) • Different sites may have different opinions about the topic A question: What is your attitude towards “Iraq war”? agreeor disagree?
Background A misconception may be caused, if sites’ tendencies are not known in advance If it is a pro-war site I agree this war ??? News Site If it is an anti-war site Is the Iraq war right or wrong? I disagree this war Sentiment tendencies of sites positive Site A ??? Information credibility is improved Well, I have now opinions on different sites negative Is the Iraq war right or wrong? positive Site B This may cause a more fair-minded judgment negative
Outline • Background • Research goal • System overview • Offline processing • Online processing • Experimental evaluation • Conclusion and future work
6 A concept of sentiment map A query is “Iraq war” Mapping Graph of sentiment based on location Positive Top ranked articles from each news site Negative Demonstration
Outline • Background • Research goal • System overview • Offline processing • Online processing • Experimental evaluation • Conclusion and future work
Web System overview Online processing (Runtime processing) Offline processing (Preprocessing) news sites Yomiuri (Osaka) Yomiuri (Tokyo) Asahi (Tokyo) ・・・ sentiment map query crawling 1) retrieve articles from each news site 2) rank the articles based on tf-idf in each site articles database (including tf-idf, sentiment values) news articles collection morphological analysis 3) calculate the average of sentiment values for each site tf-idf value calculation sentiment values calculation sentiment dictionary 4) generate a sentiment map
Outline • Background • Research goal • System overview • Offline processing • Online processing • Experimental evaluation • Conclusion and future work
Offline processing • News articles collection • Crawl news articles from various news sites and store them into DB • News articles analysis • Eliminate HTML tags • Make morphological analysis to extract nouns, verbs, and adjectives • Calculate tf-idf values of extracted word j for each news article pi • Attach a sentiment vector to each news article • Use a sentiment dictionary Fj: the frequency of word j appearing on article pi Fall: the number of all words on pi N: the number of all articles Nj: the number of articles including j
Sample of sentiment dictionary e = a, b, c, d ⇔ ⇔ ⇔ ⇔ Oc(death) = 0.260 • Sentiment value Oe(w) of an entry word w • A value between 0~1, (e.g., 0: dark, 1: bright) • Calculated by analyzing co-occurrence with the original sentimentwords, based on 200 million articles of Nikkei newspapers
Calculation of Sentiment value Oe(w) • Sentiments and their corresponding original sentiment words e1 e2 Sentiment value: df(e): occurrence times of original sentiment words e df(e&w): co-occurrence times of original sentiment words e and an entry word w
Calculation of Sentiment value Oe(w) • Sentiments and their corresponding original sentiment words e1 e2 Sentiment value of word “death” on the dimension c: Oc(death) = 0.260 Because df(“comfortable” & “death”), df(“peaceful” & “death”), df(“slow” & “death”) << df(“tension”& “death”), df(“emergency”& “death”)
Sentiment vector O(TEXT) of a news article • a news article text =TEXT • TEXT has the number of n keywords • keywords = {w} • Each sentiment value Oe(TEXT) • Sentiment vector O(w) of the article for the keyword w
Outline • Background • Research goal • System overview • Offline processing • Online processing • Experimental evaluation • Conclusion and future work
Online processing • When a user enters query keywords, • Retrieve news articles including the keywords • Rank articles based on tf-idf values for each news site • Calculate the average of sentiment vectors of top n articles for each site • Attach sentiment graphs to corresponding locations of news sites • Also present a list of articles grouped by each site
Outline • Background • Research goal • System overview • Offline processing • Online processing • Experimental evaluation • Conclusion and future work
Experimental evaluation • Query: Daisuke Matsuzaka • A famous Japanese Major Leaguer • A reviewer read all the retrieved articles of different news sites and decided the sentiments of each news site • positive, negative or neutral • For comparison, numeric sentiment values given from our system are categorized to discrete values • positive, negative or neutral
Experimental evaluation • Precision is about 70% • There exist some distinctions among different news sites ⇔ ⇔ ⇔ ⇔
Outline • Background • Research goal • System overview • Offline processing • Online processing • Experimental evaluation • Conclusion and future work
Conclusion and future work • Conclusion • Developed a system called sentiment map for visualizing the sentiment distinction of different news sites • Tested its effectiveness • A prototype: http://klab.kyoto-su.ac.jp/~fujita/cgi-bin/Fuzilla/News/ • Future work • More experiments • Sentiment analysis of readers and information recommendation based on it
Sample of sentiment dictionary e = a, b, c, d Se(w): impression value Me(w): weight Sc(death) = 0.260 Mc(death) = 1.306
Sentiment value Oe(w) of an entry word w • Original impression words and their correspondence with sentiments e1 e2 • Sentiment value Oe(w) of an entry word w • A value between 1~0, (1: positive, 0: negative) • Calculated by analyzing the co-occurrence with the original impression words, based on Nikkei Newspaper Full Text Database (about 200 million articles)
Sentiment value Oe(w) of an entry word w e1 e2 Se(w): impression value Me(w): weight Sentiment value of word “death” on the dimension c: Oc(death) = 0.260 “comfortable” and “death”, “peaceful” and “death” << “tension” and “death”, “emergency” and “death”
A proposition of sentiment map 27 positive 0.5 0 -0.5 negative query is “scandal” Sentiment map for each news site Top ranked articles from each news site Demonstration
System overview 28 Web Online processing (Runtime processing) Offline processing (Preprocessing) news sites Yomiuri (Osaka) Yomiuri (Tokyo) Asahi (Tokyo) ・・・ query sentiment map crawling 1) retrieve articles from each news site 2) rank the articles based on tf-idf in each site articles database (including tf-idf, sentiment values) news articles collection morphological analysis 3) calculate the average of sentiment values for each site tf-idf value calculation sentiment values calculation sentiment dictionary 4) generate a sentiment map