230 likes | 238 Views
Extract the underlying influence network in online communities based on opinion conformity and analyze its characteristics to improve item rating prediction. Leveraging implicit social influence using an unsupervised probabilistic generative model, GhostLink. Fast convergence demonstrated through efficient Gibbs sampling algorithm. Large-scale experiments conducted on 13 million reviews in four communities.
E N D
GhostLink: Latent Network Inference for Influence-aware Recommendation YuchongZheng
Introduction 01 02 03 04 05 Ghostlink: Influence-Facet Model CONTENTS Joint Probabilistic Inference Experiment Conclusion
01 PART Introduction
Introduction 01 02 • The Motivation of the Paper Traditionally is that similar users have similar rating behavior and facet preferences. Recent works use review content and temporal content to extract further cues. Although all these works assume that users behave independently, but in the real world things are different. But how can we detect the influence of this influence of online activities? • Problem • Some Works Some big online review communities, such as Amazon or Beeradvocate do not have explicit social network. So the author think if they can test the influence based on other signal? The first method is to exploit the observed social network or interaction of users. Some recent works present explicit user-user relationship to propose social-network based recommendation.
The Method of The Paper The Method Conclusion of Figure In this paper, the author leverage on pinion based on writing style as an indication of the influence. And use GhostLink to analysis the problem. It shows that there are only a few users who influence most of the others, and the distribution of influencers follows a power-law like distribution. Goals Given only time stamped reviews of users in online communities, extract the underlying influence network of who-influence-whom based on opinion conformity and analyze the characteristic of the network. Leverage the implicit social influence to improve item rating prediction.
The Contributions of Paper Model Algorithm The paper propose an unsupervised probabilistic generative model GhostLink to learn influence graph in online communities. It propose an efficient algorithm based on Gibbs sampling to estimate the hidden parameters in GhostLink that empirically demonstrates fast convergence. Experiments The paper performs large-scale experiments in four communities with 13 million reviews. Moreover, it analyze the properties of the influence graph and use for use-cases like finding influential members in the community
02 PART Ghostlink: Influence-Facet Model
Assume and Preset The goal of the paper is to learn the influence graph between users based on their review content. It argues that influence is reflexed by the used/echoed words and facts. The generative process is shown below. V is sampled user. d’is the actual view. Xita is the influencer’s faced distribution. Td is the timestamp Fei is categorical distribution
Conclusion In summary, the paper points out that the user’s review can be regarded by the summary/ or the mixture of her latent preferences and the preference of the influence.
03 PART Joint Probabilistic Inference
Overall processing Exploitingthe above results, the overall inference is an iterative process consisting of the following steps. The paper sorts all reviews on an item by timestamps. For each word in each review on an item: 1.Estimate whether the word has been written under influence. 2.In case of influence (s = 1), an influencer v is jointly sampled from the previous step. 3.Sample a facet for the word keeping all influencers and influence variables fixed. The process is repeated until convergence of the Gibbs sampling process
Example Consider a set of reviews written by three users in the following time order: Adam, Bob and Sam(see in table 2). The table also shows the current assignment of the latent variables z and s. The goal is to re-sample the influence variables. let n(u,s) be the number of tokens written by u with influence variable as s, n(d,z) be the total number of tokens with topic as z in document d, n(d) be the number of tokens in document d, and n(u) be the total number of tokens written by u.
In a communitysetting, especially for communities dealing with items of fine taste like movies, food and beer where users co-review multiple items, such statistics are aggregated over several other items. This provides a stronger signal for influence when a user copies/echoes similar facet descriptions from a particular user across several items. therefore, this paper relies on three main factors to model influence and influencer in the community: • The vulnerability of a user u in getting influenced, modeled byπ and captured in the counts of n(u, s). • The textual focus of the influencing review vd by v on the specific facet (z)modeled by θ and captured in the counts of n(vd,z); as well as how many times the influencer vinfluenced u, modeled by ψ and captured in counts of n(u, v, s = 1) — aggregated over all facets and items they co-reviewed. • The latent preference of u for z, modeled by θuand captured in the counts of n(u, z, s = 0).
04 PART Experiment
Sources of The Experiment Inorder to test GhostLink, the paper use four online communities in different domains: BearAdvocate, Ratebeer for beer review and Amazon for movie and food reviews. Thearticle set the number of latent facets K=20. For each community, the symmetric concentration are set as below:
Likelihood, Smoothness, Fast Convergence • The figure shows the log-likeyhood of the data per iteration. The learning is stable and has a smooth increase in the data log-likelyhood. • Also it is easy to find out that there are great difference when Ghostlink consider the influence perform • The table shows the run time comparison to convergence between the basic and fast implementation of GhostLink
Influence-aware Item Rating Prediction • The influence that the paper present next is the effectiveness of GhostLink in rating prediction. Here the paper has divided it into four groups(Methods), • 1.GhostLink • 2.Rating+ Test-aware • 3.Rating+Time+Network • 4.Rating+ Time-aware. • And the results are shown as the right side: With the results on the right side, it is easy to find out that textural features along are not helpful. It improve when we incorporate more influence-specific features.
Facet Preference Divergence In this part the paper want to examine if there is any difference between the latent facet preference of users as opposed to their observed preference. The paper compute Jensen-Shannon Divergence. The result show that: 1. there is a strong occurrence of social influence on user preferences in online communities. 2.The users are more likely to use their original latent preferences to influence others in the community.
Structure of the Influence Network At last the author analyzes the structure of the influence network Ψ and want to find out how is the mass distributed in the network. For this, the author computed a Maximum Weighted Spanning Tree. And the features are shown as below. It points out that the majority of mass of the influence graph is concentrated in giant tree-components.
Structure of the Influence Network The tree structure on right shows another characteristics: only a few users seem to influence many others. Next we analyze if we observe specific power-law behaviors. And the results are showed as below. Also there are only a few hubs which have very high hubs.
05 PART Conclusion
This paper uses GhostLink to analyze the underlying influence graph in online communities. With this method, we can improve item rating prediction by 23% over state of the art methods by capturing implicit social influence.