240 likes | 291 Views
Using Personal-Characteristic and Friend-Ranking in Blog Search. D93944003 趙國成 D97921018 陳信宏 2009/1/9. Outline. Scope Problem Solution Evaluation Conclusion & Future Works. Outline. Scope Problem Solution Evaluation Conclusion & Future Works. Scope.
E N D
Using Personal-Characteristic and Friend-Ranking in Blog Search D93944003 趙國成 D97921018 陳信宏 2009/1/9
Outline • Scope • Problem • Solution • Evaluation • Conclusion & Future Works
Outline • Scope • Problem • Solution • Evaluation • Conclusion & Future Works
Scope • The search targets are at the document level, i.e., entries of a feed. • The search target are text only. (no photo, movie, audio, etc.)
Outline • Scope • Problem • Solution • Evaluation • Conclusion & Future Works
Specialties of blog • Each article belongs to a specific category. • Each article belongs to a member who has his characteristic like interests. • Members may have friends, hence forms a social network.
Problem • How to adopt these information to improve the searching effectiveness ? • Category • Personal Characteristic • Friend relation
Outline • Scope • Problem • Solution • Evaluation • Conclusion & Future Works
Solution • Query = Keyword + Category • Weighting of People Characteristic • How much articles he has posted ? • Is his interest falls into the queried category ? • Weighting of friend • Are his friends also interested at the queried category ?
More Precise Definitions Final Ranking member category Result of LM Personal Ranking Friend Ranking
Implementation Steps • Define categories • Crawl pages from blog sites by each category • Generate the LM Model of the documents in each categories. • Generate the member-page mapping. • Generate the member-friend mapping.
Define Categories • We empirically define 13 categories. • We hope the categories are mutually independent. 1, 創作 2, 旅遊 3, 美食 4, 醫療保健 5, 運動 6, 影視 7, 生活休閒 8, 科學科技 9, 動漫電玩 10, 學習 11, 財經 12, 社會政經 13, 其它
Crawl pages from blog sites by each category • Many blog websites provide the function of browsing by category. • But not everyone. • We crawl the pages from websites providing this function as the training documents. • For other documents, we use text classification algorithm to decide their categories.
Generate the member-page mapping • In almost all the blog websites, the URL of each page containing the member-id information. • http://www.wretch.cc/blog/ddedogtoootw/9759034 • http://blog.udn.com/wong2006/2547710 • http://tw.myblog.yahoo.com/jun681031-bear/article?mid=5556
Generate the member-page mapping • We can easily find the expression rules and fetch member-id from the URL. • http://www.wretch.cc/blog/ddedogtoootw/9759034 • http://www.wretch.cc/blog/minyang0925/20688505 • http://www.wretch.cc/blog/greezydebut/7175512 • http://www.wretch.cc/blog/ddedogtoootw/ • http://www.wretch.cc/blog/minyang0925/ • http://www.wretch.cc/blog/greezydebut/ • ddedogtoootw • minyang0925 • greezydebut
Generate the member-friend mapping • What is the definition of friend? • My friend? • Somebody who set me as his friend? • Somebody who has visited my blog? • Somebody who has commended my blog? • Somebody who has left messages for me? • …… • Which definition is suitable for each blog website?
Generate the member-friend mapping • Our definition • Somebody whose page-urls are occurred in my articles. • This relation is usually caused by “reply”. http://www.wretch.cc/blog/illyqueen/12364112 source • … • http://www.wretch.cc/blog/love6380/20856457 • http://www.wretch.cc/blog/oeoehaha/5943390 http://www.wretch.cc/blog/parfaite/15050239 • …
Conclusion of Solution • For each article, we know its category and author. • For each member (author), we know all the articles he has posted and his friend. • Hence we can calculate R(d).
Outline • Scope • Problem • Solution • Evaluation • Conclusion & Future Works
Evaluation • How to decide if a document is relevant? • Feedback from user. • Comparison • Rdoc (pure LM) • R (LM + Rpsn + Rfnd) • What are the effect of α and β ?
Outline • Scope • Problem • Solution • Evaluation • Conclusion & Future Works
Conclusion • We adopt these information to improve the searching effectiveness. • Category • Personal Characteristic • Friend relation • We will compare the effectiveness of with and without our method.
Future works • How about consider feed instead of entry? • Are there better definition of Personal Characteristic & Friend? • Are there better equation of R(Rdoc,Rpsn,Rfnd)?
Thank you We appreciate your suggestions !