110 likes | 225 Views
Chinese Blog Clustering by Hidden Sentiment Factors. ADMA 2009 Shi Feng, Daling Wang, Ge Yu, Chao Yang, and Nan Yang. College of Information Science and Engineering, Northeastern University. Hidden Sentiment Factors(HSF). Probabilistic latent semantic analysis (PLSA)
E N D
Chinese Blog Clustering by Hidden Sentiment Factors ADMA 2009 Shi Feng, Daling Wang, Ge Yu, Chao Yang, and Nan Yang. College of Information Science and Engineering, Northeastern University
Hidden Sentiment Factors(HSF) • Probabilistic latent semantic analysis (PLSA) • Blog Set B = {b1,b2,…,bN} • Sentiment words set W = {w1,w2,…,wM} • NTUSD • 2,812 positive words and 8,276 negative words • Hownet Sentiment Dictionary • 4,566positive words and 4,370 negative words • A = NxM Matrix , A(i,j) = Freq(bi,wj) • HSF Z = {z1,z2,….,zk}
Hidden Sentiment Factors(HSF) P(w|b) -> P(z|b)
Clustering by HSF • K-Means Algorithm • k’ : # of clusters. In this paper, set k’ = k. • Fig.1 Similarity=0 • Fig.2 Similarity=?
Experiment • 1. Collect blogs about reviews on Stephen Chow’s movie “CJ7” (Long River 7) • 2. Collect blog entries about Liu Xiang since 2008/8/18. • Tag1. “Positive”, “Negative” and “Neutral”Tag2. “Irrelevant” or not • Ex: A blog may tagged {“Positive” , ”Irrelevant”}, {“Neutral”} or {“Negative” , ”Irrelevant”} • Evaluate the clustering purity.