320 likes | 327 Views
This research explores how blog communities form based on mutual awareness among bloggers and presents an analytical framework for community extraction using performance metrics and experimental results.
E N D
Discovery of Blog Communities based on Mutual Awareness Yu-Ru Lin, Hari Sundaram, YunChi, Jun Tatemura, Belle Tseng [3rd Annual Workshop on the Weblogging Ecosystems] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/06/19
Outline • Introduction • Analytical framework • Community formation • Extracting blog communities • Performance metrics • Experiment
Introduction • Blog • Become prominent social media • Enable users to publish content quickly • Blog communities are different from traditional web communities • Formed based on mutual awareness
Introduction • Mutual awareness • Individual bloggers become aware of each other’s presence through interaction (e.g., Comments, trackback). • No blogger is aware of others’ s actions Not communities!
Introduction • New approach to community extraction Two steps: (a) Analysis of mutual awareness from bloggers’ actions. (b) Ranking-based community extraction from mutual awareness.
Analytical framework (1/3) • Community emerges through individual bloggers’ behavior • Individual bloggers are aware of each other’s presence through interaction (Bi-directional property) • Community needs to sustain over time
Analytical framework (2/3) • Properties of blogs 1. Temporal dynamics: Blogs represent easily editable content. 2. Event Locality:A typical blog entry is time sensitive. 3. Link SemanticsA hyperlink can have different semantics. 4. Community CentricPeople that interested in each others’ content
Analytical framework (3/3) • Community extraction problem • Distinct from traditional ranking problems on the web. • Key difference is the semantics of the hyperlinked structure. Hub Authority
Community Formation • Acts of bloggers in the blogosphere 1. Surf/read 2. Create entries 3. Comment 4. Change blogroll
Extracting blog communities • Extracting blog communities 1. Computing Mutual Awareness 2. Ranking-Based Clustering Method
Computing Mutual Awareness (1/7) • Mutual awareness is affected by • Type of action • Number of actions for each type • When the action occurred • Mutual awareness depends on sustained actions
Computing Mutual Awareness (2/7) • Graph G = (V, E), nodes = bloggers, edge = connection of nodes, wij= weight on edges is a function of mutual awareness • Mutual awareness matrix M • A weighted linear combination of action matrices
Computing Mutual Awareness (3/7) • For each action type k at time t, compute Temporal action matrix Xk,t • Each entry xij,k,t of matrix represents the number of times the kth action ak was performed by blogger i on blogger j. • e.g. Blogger i leaves a comment on blogger j’s entry.
Computing Mutual Awareness (4/7) C action type 1:yellow action type 2: blue action type 3:red X1,t =X2,t =
Computing Mutual Awareness (5/7) • Effect of actions to mutual awareness diminish gradually. (λkisdecayingfactorfortheactiontypek)
Computing Mutual Awareness (6/7) • Two specific types of actions (a) create an entry-to-entry link (mij=0 if mij < λm) (b) send a trackback (r denotes how likely the trackback receiver is to be aware of the trackback sender by the action of sending a trackback)
Computing Mutual Awareness (7/7) • Fusion of Actions • Assume actions ak in the action set are independent of each other.
Ranking-Based Clustering Method • Start from highly ranked blogs to extract dense subgraphs that include popular bloggers 1. Choose seeds based on PageRank (rt is a vector of ranking score r, α is damping factor) 2. Find community members associated with seeds 3. Iterate 1 and 2 to discover N communities Exclude blogs in the existing communities
Performance Metrics • Coverage • measures the fraction of edges that are intra-community • Higher coverage have higher quality • Conductance • Small conductance has higher equality
Performance Metrics • Interest Coefficient • Measure how much a community member is interested in his or her assigned community. • Interest coefficient of an individual blogger m toward an assigned cluster Ck • Interest coefficient of a cluster Ck
Performance Metrics • Sustainability • Average sustainability for a set C of k extracted communities
Experimental Results • NEC Blog Dataset • 127,467 entries for a period of 25 consecutive weeks • 584 seed blogs • 40,877 links • 2,898 trackback links
Experimental Results • Compare the communities extracted by using two different features • Baseline adjacency matrix (wij = # entry to entry links) • Mutual awareness matrix (wij = mutual awareness score)
Experimental Results • Experimental design
Experimental Results • Sustainability of communities
Experimental Results • Use content of blog entries to validate communities red: trackback link blue: entry-to entry links
Experimental Results • WWW 2006 Blog Workshop Dataset • 8.37 million entries • 1.43 million blog sites (Narrow down to 122K)