200 likes | 307 Views
Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al. Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith. Introduction. People increasingly publish their reactions to public events using a blog
E N D
Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al Thomas van der Elsen, Richard Lawrence,Jumi Oladimeji, Alastair Smith
Introduction • People increasingly publish their reactions to public events using a blog • A tool that enables this info to be published quickly • A journal that is available on the web • Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web) • “Our goal is to develop a method of capturing hot conversations by automating readers’ processes for characterizing and monitoring blogs.”
Overview • Data-mining techniques • Creation of blog link structure • Analysing link structure • Types of important bloggers • Agitators • Summarisers • Applications, analysis and conclusions • Real-world applications and extensions • Pros and cons of the paper
Data-Mining Techniques Crawling blogs Extracting hyperlinks Extracting blog threads
System crawls through RSS list registering for each entry: Title Permalink List entry date Aggregator: gathers RSS feeds from multiple sources and organises them OPML: file format used to share RSS feed lists RSS: A format for distributing content on the web Crawling blogs OPML Aggregators RSS feeds RSS list
Extracting hyperlinks • Problem: Different tag structures per server RSS feed from list Blog entries Description Hyperlink list
Extracting blog threads Hyperlink If replyLink If sourceLink Check departure URL exists in thread data Check destination URL points to entry on list Check links exist in thread data && 11 00 Add 10 Add dest entry to thread 01 Create new thread Add destination entry to entry list and add to thread Add departure entry to thread
Types of users Agitators Summarisers Joe Bloggs
Discussion stimulator Threads often grow after an agitator’s entry Three discriminants for an agitator Link (Agi1) Popularity (Agi2) Topic (Agi3) The three discriminants can be weighted using the following formula: Agitators
Link-based Discriminant ex is an agitator if (kx) > θ1 • ex = a blog entry • kx = no of entries in threadi with a replyLink to ex
Popularity-based discriminant ex is an agitator if (lx/mx) > θ2 • ex = a blog entry • lx = no of entries in threadipublished t days after ex • mx = no of entries in threadi published t days before ex
Topic-based discriminant ex is an agitator if • ex = a blog entry • n = number of entries
Summarizers • Publish entries that collate and compact previous posts • Provide a convenient way of digesting an entire thread • The discriminant for summarizers is link-based: ex is a summarizer if (px) > θ4 • ex = a blog entry • px = number of entries in threadi that have a replyLink from ex
Analysis Applications Pros and Cons Conclusions
Applications • Supplementary info e.g. TV, news site etc • Home and Away – who shot Josh West • Agitator • Sports, etc. – used by studios and media to highlight points of interest in a match • Summariser
Analysis – Pros • Basis for future research – a brief intro to the subject. • Multiple thread analysis • Identification of areas of bloggers’ expertise • Highly effective in certain specific areas • News and reviews • Implementation of theory (feature vector)
Analysis – Cons • Only 25 sites used in sample (but 1000s of blogs) • Does not take context into consideration • E.g., an agitator may be posting offensive entries • No measurement of summary success • Comments are not analysed • Inappropriate for certain areas • MySpace, Bebo, et al. (due to target audience)
Conclusions • Created a data-mining framework for future research • May instigate research into further work • Nice idea and potentially useful but needs to be extended
Any Questions? Thank you for your time